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We present first evidence for the production of single top quarks in the DO detector at the Fermilab 
Tevatron pp collider. The standard model predicts that the electroweak interaction can produce a 
top quark together with an antibottom quark or light quark, without the antiparticle top quark 
partner that is always produced from strong coupling processes. Top quarks were first observed 
in pair production in 1995, and since then, single top quark production has been searched for 
in ever larger datasets. In this analysis, we select events from a 0.9 fb _1 dataset that have an 
electron or muon and missing transverse energy from the decay of a If boson from the top quark 
decay, and two, three, or four jets, with one or two of the jets identified as originating from a 
b hadron decay. The selected events are mostly backgrounds such as W+jets and ti events, which 
we separate from the expected signals using three multivariate analysis techniques: boosted decision 
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trees, Bayesian neural networks, and matrix element calculations. A binned likelihood fit of the 
signal cross section plus background to the data from the combination of the results from the three 
analysis methods gives a cross section for single top quark production of a(pp — > tb + X, tqb + X) = 
4.7 ± 1.3 pb. The probability to measure a cross section at this value or higher in the absence of 
signal is 0.014%, corresponding to a 3.6 standard deviation significance. The measured cross section 
value is compatible at the 10% level with the standard model prediction for electroweak top quark 
production. We use the cross section measurement to directly determine the Cabibbo-Kobayashi- 
Maskawa quark mixing matrix element that describes the Wtb coupling and find | Vtbfi | = 1-31 jlo'21 > 
where /f is a generic vector coupling. This model-independent measurement translates into 0.68 < 
|Vt&| < 1 at the 95% C.L. in the standard model. 



PACS numbers: 14.65.Ha; 12.15.Ji; 13.85. Qk 
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I. INTRODUCTION 

A. Single Top Quarks 

Top quarks were first observed in top quark - 
top antiquark pair production via the strong interaction 
in f995 [3, 0. The standard model also predicts that the 
electroweak interaction can produce a top quark together 
with a bottom antiquark or a light quark, without the 
antiparticle top quark partner that is always produced 
in strong-coupling processes. This electroweak process 
is generally referred to as single top quark production. 
Since f995, the DO and CDF collaborations have been 
searching ever larger datasets for signs of single top quark 
production. 

We present here the results of a search for top quarks 
produced singly via the electroweak interaction from the 
decay of an off-shell W boson or fusion of a virtual 
W boson with a b quark. Previously measured top quarks 
have been produced in pairs from highly energetic virtual 
gluons via the strong interaction. The cross section for ti 
production at the Fermilab Tevatron proton-antiproton 
collider (center-of-mass energy = 1.96 TcV) is 6.77 ± 
0.42 pb Q at next-to-leading order (NLO) plus higher- 
order soft-gluon corrections, for a top quark of mass 
m-top = 175 GeV Qj. The standard model predicts 
three processes for production of a top quark without 
its antiparticle partner . These are as follows: (i) the s- 
channcl process pp~^ tb + X,ib + X [ESQ, with a cross 
section of 0.88 ±0.14 pb [| at NLO for m t0 p = 175 GeV; 
(iij the t-channel process pp-^tqb + X, tqb + X 0, d, [Hi 
[il| . with a cross section of 1.98 ±0.30 pb [|| at the same 
order in perturbation theory and top quark mass; and 
(hi) the tW process pp-^tW~ + X,tW + + X @, d, 
where the cross section at the Tevatron energy is small, 
0.08 ±0.02 pb at LO. 

The main tree-level Feynman diagrams for the 
dominant single top quark production processes are 
illustrated in Fig.[TJ For brevity, in this paper we will use 
the notation "tb" to mean the sum of tb and tb, and "i<7&" 
to mean the sum of tqb and tqb. The analysis reported in 
this paper searches only for the s-channel process tb and 
the t-channel process tqb, and does not include a search 
for the tW process because of its small production rate 
at the Tevatron. 



(a) (b) 




FIG. 1: Main tree- level Feynman diagrams for (a) s-channel 
single top quark production, and (b) t-channel production. 



Top quarks are interesting particles to study since in 
the standard model their high mass implies a Yukawa 
coupling to the Higgs boson with a value near unity, 
unlike any other known particle. They also decay before 
they hadronize, allowing the properties of a bare quark 
such as spin to be transferred to its decay products 
and thus be measured and compared to the standard 
model predictions. Events with single top quarks can 
also be used to study the Wtb coupling @, [H, fl4| . 
and to measure directly the absolute value of the quark 
mixing matrix (the Cabibbo-Kobayashi-Maskawa (CKM) 
matrix [THEU) element \V t b\ without assuming there are 
only three generations of quarks [l?], EH- A measured 
value for \Vtb\ significantly different from unity could 
imply the existence of a fourth quark family or other 
effects from beyond the standard model [Hj] . 



B. Search History 

The DO collaboration has published three searches 
for single top quark production using smaller datasets. 
We analyzed 90 pb -1 of data from Tevatron Run I 
(1992-1996 at a center-of-mass energy of 1.8 TeV) 
which resulted in the first upper limits on single top 
quark production [2(i| and we performed a more refined 
search using neural networks that achieved greater 
sensitivity [21j. In Run II, we used 230 pb -1 of data 
collected from 2002 to 2004 to set more stringent upper 
limits [H HI- 0ur best published 95% C.L. upper 
limits are 6.4 pb in the s-channel (tb production) and 
5.0 pb in the t-channel (tqb production). Students in 
the DO collaboration have completed ton Ph.D. disser- 
tations on the single top quark search J24[. Our most 
recent publication [25| presents first evidence for single 
top quark production using a 0.9 ftr 1 dataset. We 
provide a more detailed description of that result here, 
and also include several improvements to the analysis 
methods that lead to a final result on the same dataset 
with slightly higher significance. 

The CDF collaboration has published two results from 
analyzing 106 pb -1 of Run I dat a [26l . [27T |. and one that 
uses 162 pb- 1 of Run II data [If Their best 95% 
C.L. upper limits are 14 pb in the s-channel, 10 pb 
in the t-channel, and 18 pb in the s-channcl and t- 
channcl combined. Students in the CDF collaboration 
have completed seven Ph.D. dissertations on the single 
top quark search [29j . 



C. Search Method Overview 

The experimental signal for single top quark events 
consists of one isolated high transverse momentum 
(pr), central pseudorapidity (rj [3(j) charged lepton and 
missing transverse energy ($t) from the decay of a 
W boson from the top quark decay, accompanied by 
a b jet from the top quark decay. There is always a 
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second jet, which originates from a b quark produced 
with the top quark in the s-channel, or which comes 
from a forward-traveling up- or down-type quark in t- 
channel events. Some t-channel events have a detectable 
b jet from the gluon splitting to bb. Since there may be 
significant initial-state or final-state radiation, we include 
in our search events with two, three, or four jets. We 
use data collected with triggers that include an electron 
or a muon, and a jet. In the electron channel, multijct 
events can fake signal ones when a jet is misidentified as 
an electron, and we have stringent identification criteria 
for electrons to reduce this type of background. In the 
muon channel, 66+jets events can fake signal ones when 
one of the 6's decays to a muon. We reject much of this 
background by requiring the muon to be isolated from 
all jets in the event. Finally, we apply a set of simple 
selection criteria to retain regions of phase space that 
single top quark events tend to populate. 

We divide the selected events into 12 nonoverlapping 
samples, referred to as analysis channels, depending 
on the flavor of the lepton (e or fi), the number of 
jets (2, 3, 4), and the number of jets identified as 
originating from b quarks (number of "tagged" jets = 1, 
2), because the signal-to-background ratios and fractions 
of expected signal in each channel differ significantly. 
The dominant background in most of these channels 
is JT+jets events. We model this background using 
events simulated with Monte Carlo (MC) techniques and 
normalized to data before b tagging. We also use an 
MC model to simulate the background from tt events. 
Finally, we use data events with poorly identified leptons 
to model the multijet background where a jet is misiden- 
tified as an electron, or a muon in a jet from c or b decay 
is misidentified as a muon from a W boson decay. We 
apply a neural-network-based 6-identification algorithm 
to each jet in data and keep events with one or two jets 
that are identified as b jets. We model this b tagging in 
the MC event samples by weighting each event by the 
probability that one or more jets is tagged. 

After event selection, we calculate multivariate 
discriminants in each analysis channel to separate 
as much as possible the expected signal from the 
background. We then perform a binned likelihood fit of 
the background model plus possible signal to the data 
in the discriminant output distributions and combine 
the results from all channels that improve the expected 
sensitivity. Finally, we calculate the probability that our 
data are compatible with background only, use the excess 
of data over background in each bin to measure the signal 
cross section, and calculate the probability that the data 
contains both background and signal produced with at 
least the measured cross section value. 

For each potential analysis channel, the relevant details 
are the signal acceptance and the signal-to-background 
ratio. Table U shows the percentage of the total signal 
acceptance for each jet multiplicity and number of b- 
tagged jets, and the associated signal-to-background 
ratios. We used this information to determine that the 



most sensitive channels have two, three, or four jets, and 
one or two b tags. In the future, it could be beneficial to 
extend the analysis to include events with only one jet, 
b tagged, since the signal-to-background ratios are not 
bad, and to study the untagged events with two or three 
jets where there is significant signal acceptance. 

TABLE I: Percentage of total selected MC single top quark 
events (i.e., all channels shown in the table) for each jet 
multiplicity and number of 6-tagged jets, and the associated 
signal-to-background ratios, for the electron and muon 
channels combined. The values shown in bold type are for 
the channels used in this analysis. 
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D. Differences from Previous Searches 

We summarize here the changes and improvements 
made to the analysis since the previously published 
DO result that used 230 pb" 1 of data [H Q. The 
most important difference is that we have analyzed a 
dataset four times as large. Other changes include the 
following: (i) use of an improved model for the t-channel 
tqb signal from the package SINGLETOP [3l| . based on 
COMPHEP [33] , which better reproduces NLO-like parton 
kinematics; (ii) use of an improved model for the ti 
and VT+jets backgrounds from the ALPGEN package [33[ 
that has parton-jet matching [3~i| implemented with 
pythia [301 to avoid duplicate generation of some initial- 
state and final-state jet kinematics; (iii) determination 
from data of the ratio of W boson plus bb or cc jets to 
the total rate of ty+jets production; (iv) omission of 
a separate calculation of the diboson backgrounds WW 
and WZ since they are insignificant; (v) differences in 
electron, muon, and jet identification requirements and 
minimum pt's; (vi) use of a significantly higher efficiency 
fe-tagging algorithm based on a neural network; (vii) 
splitting of the analysis by jet and 6-tag multiplicity 
so as not to dilute the strength of high-acceptance, 
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good signal-to-background channels by mixing them with 
poorer ones; (viii) simplification of the treatment of 
the smallest sources of systematic uncertainty (since the 
analysis precision is statistics dominated); (ix) use of 
improved multivariate techniques to separate signal from 
background; and (x) optimization of the search to find 
the combined single top quark production from both the 
s- and t-channels, tb+tqb. 



II. THE DO DETECTOR 

The DO detector [37j consists of three major parts: 
a tracking system to determine the trajectories and 
momenta of charged particles, a calorimeter to measure 
the energies of electromagnetic and hadronic showers, 
and a system to detect muons, which are the only charged 
particles that are typically not contained within the 
calorimeter. The first element at the core of the detector 
is a tracking system that consists of a silicon microstrip 
tracker (SMT) and a central fiber tracker (CFT), both 
located within a 2 T superconducting solcnoidal magnet. 
The SMT has six barrel modules in the central region, 
each comprising four layers arranged axially around the 
beam pipe, and 16 radial disks interspersed with and 
beyond the central barrels. Ionization charge is collected 
by w 800, 000 p- or n-type silicon strips of pitch between 
50 and 150 /mi that are used to measure the positions 
of the hits. Tracks can be reconstructed up to pseudora- 
pidities HO] of |ry dct | w 3.0. 

The CFT surrounds the SMT with eight thin coaxial 
barrels, each supporting two doublets of overlapping 
scintillating fibers of 0.835 mm diameter, one doublet 
being parallel to the beam axis, and the other alternating 
by ±3° relative to the axis. Visible-light photon 
counters (VLPCs) collect the light signals from the 
fibers, achieving a cluster resolution of about 100 /mi 
per doublet layer. 

Central and forward preshower detectors contribute 
to the identification of electrons and photons. The 
central preshower detector is located just outside of the 
superconducting coil and the forward ones are mounted 
in front of the endcap calorimeters. The preshower 
detectors comprise several layers of scintillator strips that 
are read out using wavelength-shifting fibers and VLPCs. 

Three finely grained uranium/liquid-argon sampling 
calorimeters constitute the primary system used to 
identify electrons, photons, and jets. The central 
calorimeter (CC) covers |?7 det | up to sj 1.1. The two 
end calorimeters (EC) extend the coverage to |?/ dot | r* 
4.2. Each calorimeter contains an electromagnetic (EM) 
section closest to the interaction region with approx- 
imately 20 radiation lengths of material, followed by 
fine and coarse hadronic sections with modules that 
increase in size with distance from the interaction region 
and ensure particle containment with approximately six 
nuclear interaction lengths. In addition to the preshower 
detectors, scintillators between the CC and EC provide 



sampling of developing showers in the cryostat walls for 
1.1 < |?7 dct | < 1.4. 

The three-layer muon system is located beyond the 
calorimetry, with 1.8 T iron toroids after the first 
layer to provide a stand-alone muon-system momentum 
measurement. Each layer comprises tracking detectors 
and scintillation trigger counters. Proportional drift 
tubes 10 cm in diameter allow tracking in the region 
|r/ dot | < 1, and 1 cm mini drift tubes extend the tracking 
to |?7 dct | < 2. 

Additionally, plastic scintillator arrays covering 2.7 < 
|^dot| < 4 4 are usec j £ m easure the rate of inelastic 
collisions in the DO interaction region and calculate the 
Tevatron instantaneous and integrated luminosities. 

We select the events to be studied offline with a three- 
tiered trigger system. The first level of the trigger makes 
a decision based on partial information from the tracking, 
calorimeter, and muon systems. The second level of the 
trigger uses more refined information to further reduce 
the rate. The third trigger level is based on software 
filters running in a farm of computers that have access 
to all information in the events. 



III. TRIGGERS AND DATA 

The data were collected between August 2002 and 
December 2005, with 913 ± 56 pb" 1 and 871 ± 53 pb" 1 
of good quality events in the electron and muon channels 
respectively. 

As the average instantaneous luminosity of the 
Tevatron has increased over time, the triggers used 
to collect the data have been successively changed to 
maintain background rejection. The requirements at the 
highest trigger level are the following, with the associated 
integrated luminosity included in parentheses: 

Electron+jets triggers 

1. One electron with px > 15 GeV and two jets with 
p T > 15 GeV (103 pb- 1 ) 

2. One electron with px > 15 GeV and two jets with 
p T > 20 GeV (227 pb" 1 ) 

3. One electron with pr > 15 GeV, one jet with 
Pt > 25 GeV, and a second jet with pt > 20 GeV 
(289 pb" 1 ) 

4. One electron with px > 15 GeV and two jets with 
p T > 30 GeV (294 pb" 1 ) 

Muon+jets triggers 

1 . One lower-trigger- level muon with no px threshold 
and one jet with p T > 20 GeV (107 pb" 1 ) 

2. One lower-trigger- level muon with no px threshold 
and one jet with p T > 25 GeV (278 pb -1 ) 

3. One muon with pt > 3 GeV and one jet with pt > 
30 GeV (252 pb^ 1 ) 

4. One isolated muon with pr > 3 GeV and one jet 
with p T > 25 GeV (21 pb" 1 ) 
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5. One muon with pt > 3 GcV and one jet with px > 
35 GeV (214 pb" 1 ) 

The average efficiency of the electron+jets triggers 
is 87% for tb events and 86% for tqb events that pass 
the final selection cuts. The average efficiency of the 
muon+jets triggers is 87% for tb and 82% for tqb events. 

Note that for the electron+jets triggers, the electron 
usually satisfies one of the jet requirements, and thus 
there are usually only two independent objects required 
in each event (one electron and one jet). 



IV. EVENT RECONSTRUCTION 

Physics objects are reconstructed from the digital 
signals recorded in each part of the detector. Particles 
can be identified by certain patterns and, when correlated 
with other objects in the same event, they provide the 
basis for understanding the physics that produced such 
signatures in the detector. 

A. Primary Vertices 

The location of the hard-scatter interaction point is 
reconstructed by means of an adaptive primary vertex 
algorithm [38[ . This algorithm first selects tracks coming 
from different interactions by clustering them according 
to their z position along the nominal beam line. In 
the second step, the location and width of the beam in 
the transverse plane (perpendicular to the beam line) 
are determined and then used to re-fit tracks, and each 
cluster of tracks is associated with a vertex using the 
"adaptive" technique that gives all tracks a weight and 
iterates the fit. The third and last step consists of 
choosing the vertex that has the lowest probability of 
coming from a minimum bias interaction (a pp scatter 
event), based on the pt values of the tracks assigned 
to each vertex. The hard-scatter vertex is distinguished 
from soft-interaction vertices by the higher average px 
of its tracks. In multijet data, the position resolution 
of the primary vertex in the transverse plane is around 
40 /um, convoluted with a typical beam spot size of 
around 30 fim. 

B. Electrons 

Electron candidates are defined as clusters of energy 
depositions in the electromagnetic section of the central 
calorimeter (|?7 dct | < 1.1) consistent in shape with an 
electromagnetic shower. At least 90% of the energy of 
the cluster must be contained in the electromagnetic 
section of the calorimeter, /em > 0.9, and the cluster 
must satisfy the following isolation criterion: 



E totlil {1l< OA) - E EM (K < 0.2) 
E EM (K < 0.2) 



< 0.15, 



(1) 



where E is the electron candidate's energy measured in 
the calorimeter, and 1Z = (A0) 2 + (Ar/) 2 is the radius 



of a cone defined by the azimuthal angle <fi and the 
pscudorapidity r/, centered on the electron candidate's 
track if there is an associated track, or the calorimeter 
cluster if there is not. Two classes of electrons are 
subsequently defined and used in this analysis: 

• Loose electron 

A loose electron must pass the identification 
requirements listed above. In addition, the energy 
deposition in the calorimeter must be matched with 
a charged particle track from the tracking detectors 
with pt > 5 GeV. Finally, a shower-shape chi- 
squared, based on seven variables that compare 
the values of the energy deposited in each layer of 
the electromagnetic calorimeter with average distri- 
butions from simulated electrons, has to satisfy 
XL < 50. 

• Tight electron 

A tight electron must pass the loose requirements, 
and have a value of a seven- variable EM-likelihood 
C > 0.85. The following variables are used in 
the likelihood: (i) Jem] (ii) xLl R Efjp^, 
the transverse energy of the cluster divided by 
the transverse momentum of the matched track; 
(iv) the x 2 probability of the match between the 
track and the calorimeter cluster; (v) the distance 
of closest approach between the track and the 
primary vertex in the transverse plane; (vi) the 
number of tracks inside a cone of 72. = 0.05 around 
the matched track; and (vii) the ^2pt of tracks 
within an 1Z — 0.4 cone around the matched track. 
The average tight electron identification efficiency 
in data is around 75%. 



C. Muons 

Muons are identified by combining tracks in the muon 
spectrometer with central detector tracks. Muons are 
reconstructed up to |?7 dct | = 2 by first finding hits in 
all three layers of the muon spectrometer and requiring 
that the timing of these hits be consistent with the 
muon originating in the center of the detector from 
the correct proton-antiproton bunch crossing, thereby 
rejecting cosmic rays. Secondly, all muon candidates 
must be matched to a track in the central tracker, where 
the central track must pass the following criteria: (i) \ 2 
per degree of freedom must be less than 4; and (ii) the 
distance of closest approach between the track and the 
primary vertex must be less than 0.2 mm if the track has 
SMT hits and less than 2 mm if it does not. Two classes 
of muons are then defined for this analysis: 

• Loose-isolated muon 

A loose muon must pass the identification 
requirements given above. Loose muons must 
in addition be isolated from jets. The distance 
between the muon and any jet axis in the event 
has to satisfy 7?.(muon, jet) > 0.5. 
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• Tight-isolated muon 

A tight muon must pass the loose-isolation 
requirement and additional isolation criteria as 
follows: (i) the transverse momenta of all tracks 
within a cone of radius 1Z = 0.5 around the muon 
direction, except the track matched to the muon, 
must add up to less than 20% of the muon px\ 
and (ii) the energy deposited in a cone of radius 
0.1 < 1Z < 0.4 around the muon direction must be 
less that 20% of the muon p?- 



D. Jets 

We reconstruct jets based on calorimeter cell energies, 
using the midpoint cone algorithm [39| with radius 
1Z = 0.5. Noisy calorimeter cells are ignored in the 
reconstruction algorithm by only selecting cells whose 
energy is at least four standard deviations above the 
average electronic noise and any adjacent cell with 
at least two standard deviations above the average 
electronic noise. 

To reject poor quality or noisy jets, we require all 
jets to have the following: (i) 0.05 < /em < 0.95 
in the central region, with the lower cut looser in the 
intercryostat and forward regions; (ii) fraction of jet pt 
in the coarse hadronic calorimeter layers < 0.4 in the 
central region, with looser requirements in the forward 
regions; and (hi) at least 50% of the pt of the jet, not 
including the coarse hadronic layers, matched to energy 
depositions in towers in Level 1 of the trigger in a cone of 
radius 1Z = 0.5 around the jet axis in the central region, 
with looser requirements in the forward regions. 

Jet energy scale corrections are applied to convert 
reconstructed jet energies into particle-level energies. 
The energy of each jet containing a muon within 
7?.(muon, jet) < 0.5 (considered to originate from a 
scmilcptonic c- or &-quark decay) is corrected to account 
for the energy of the muon and the accompanying 
neutrino (because that energy is not deposited in the 
calorimeter and so would otherwise be undermeasured) . 
For this correction, it is assumed that the neutrino has 
the same energy as the muon. 

Jets that have the same rj and <j) as a reconstructed 
electron are removed from the list of jets to avoid double- 
counting objects. 

E. Missing Transverse Energy 

Neutrinos carry away momentum that can be inferred 
using momentum conservation in the transverse plane. 
The sum of the transverse momenta of undetected 
neutrinos is equal to the negative of the sum of the 
transverse momenta of all particles observed in the 
detector. In practice, we compute the missing transverse 
energy by adding up vectorially the transverse energies 
in all cells of the electromagnetic and fine hadronic 



calorimeters. Cells in the coarse hadronic calorimeter are 
only added if they are part of a jet. This raw quantity 
is then corrected for the energy corrections applied to 
the reconstructed objects and for the momentum of all 
muons in the event, corrected for their energy loss in the 
calorimeter. 



F. b Jets 

Given that single top quark events have at least 
one b jet in the final state, we use a 6-jet tagger to 
identify jets originating from b quarks. In addition to 
the jet quality criteria described in previous sections, 
a "taggability" requirement is applied. This requires 
the jets to have at least two good quality tracks with 
Pt > 1 GcV and pr > 0.5 GeV respectively, that 
include SMT hits and which point to a common origin. 
A neural network (NN) tagging algorithm is used to 
identify jets originating from a b quark. The tagger 
and its performance in the data is described in detail 
in Ref. j40j. We summarize briefly here its main charac- 
teristics. The NN tagger uses the following variables, 
ranked in order of separation power, to discriminate 
b jets from other jets: (i) decay length significance of 
the secondary vertex reconstructed by the secondary 
vertex tagger (SVT); (ii) weighted combination of the 
tracks' impact parameter significances; (iii) jet lifetime 
probability (JLIP), the probability that the jet originates 
from the primary vertex |4lj |; (iv) x 2 per degree of 
freedom of the SVT secondary vertex; (v) number of 
tracks used to reconstruct the secondary vertex; (vi) mass 
of the secondary vertex; and (vii) number of secondary 
vertices found inside the jet. 

For this analysis, we require the NN output to be 
greater than 0.775 for the jet to be considered b tagged. 
The average probability for a light jet in data to be falsely 
tagged at this operating point is 0.5%, and the average b- 
tagging efficiency in data is 47% for jets with |?7 dot | < 2.4. 

V. SIMULATED EVENT SAMPLES 

A. Event Generation 

For this analysis, we generate single top quark 
events with the comphep-singletop [3l|, [321 Monte 
Carlo event generator. SINGLETOP produces events 
whose kinematic distributions match those from NLO 
calculations Q. The top quark mass is set to 175 GcV, 
the set of parton distribution functions (PDF) is 
CTEQ6L1 [42|, and the renormalization and factor- 
ization scales are m 2 op for the s-channel and {mt ov /2) 2 for 
the t-channcl. These scales are chosen such that the LO 
cross sections are closest to the NLO cross sections (43| . 
The top quarks and the W bosons from the top quark 
decays are decayed in SINGLETOP to ensure the spins are 
properly transferred, pythia [36| is used to add the 
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underlying event, initial-state and final-state radiation, 
and for hadronization. tauola [44[ is used to decay 
tau leptons, and evtgen (4f| to decay b hadrons. To 
calculate the expected number of signal events, these 
samples are normalized to the NLO cross sections Q for 
a top quark mass of 175 GeV: 0.88 ± 0.14 pb for the 
s-channel and 1.98 ± 0.30 pb for the t-channel. 

The W+jets and ti samples are generated using 
ALPGEN [23]. The version we use includes a 
parton-jet matc hing algorithm that follows the MLM 
prescription [3~il . |35| For the ti samples, the top quark 
mass is set to 175 GeV, the scale is m^ op + ^p^(jets), 
and the PDF set is CTEQ6L1. For the W+jets events, 
the PDF is also CTEQ6L1 and the scale is rn^+p^W). 
The W^+jets events include separate generation of each 
jet multiplicity from W+ light partons to W+ at least 5 
light partons for events with no heavy-flavor partons (we 
refer to these samples as Wjj). Those with bb and cc 
partons have separately generated samples with between 
and 3 additional light partons. The ti events include 
separate samples with additional jets from to 2 light 
partons. 

For the VF+jets sets, we remove events with heavy 
flavor jets added by pythia so as not to duplicate the 
phase space of those generated already by ALPGEN. The 
Wcj subprocesses are included in the Wjj sample with 
massless charm quarks. 

Since the W+jets background is normalized to data 
(see Sec. IVII A [I . it implicitly includes all sources of 
W+jets, Z+jets, and diboson events with similar jet- 
flavor composition, in particular Z+jcts events where one 
of the leptons from the Z boson decay is not identified. 

The proportions of Wbb and Wcc in the IF+jets model 
are set by ALPGEN at lea ding order precision. However, 
higher order calculations [4fl l47l |48| indicate that there 
should be a higher fraction of events with heavy-flavor 
jets. We measure a scale factor for the Wbb and Wcc 
subsamples using several untagged data samples (with 
zero 6-tagged jets) that have negligible signal content. 
We obtain: 



AT-zcro-tag 
JV data 



a ( N Wbb + N Wcc) + N W jj + Nit + ^multijcts 

a = 1.50 ± 0.45 

(2) 

where the numbers of events Ni for each background 
component correspond to the expected number of 
events after event selection (described in Sec. IVI[) 
and background normalization (described in Sec. IVII[) 
and removing events with one or more 6-tagged jets. 
Additionally, we check that the same value of a = 1.5 
is obtained from the complementary W + 1 jet sample, 
where we require the only jet to be b tagged. Figure [5] 
illustrates the measurement of the scale factor a. 

We examine the distributions expected to suffer the 
largest shape dependence from higher order corrections, 
such as the invariant mass of the two leading jets and the 
Pt of the b-tagged jet, and find good agreement between 
the shapes of the data and the background model, not 
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FIG. 2: Measurements of the scale factor a used to convert the 
fraction of Wbb and Wcc events in the W+jets background 
model from leading order to higher order. The points are 
the measured correction factor in each dataset. The solid 
line is the average of these values. The dot-dash inner band 
shows the uncertainty from the fit to the eight data points. 
The dashed outer line shows the uncertainty on a used in 
the analysis to allow for the assumption that the scale factor 
should be the same for Wbb and Wcc, and for small differences 
in the shapes of distributions between the W + heavy flavor 
and W + light flavor jets. 



only in the signal region, but also in samples enriched 
with VF+jets events. 

Table HI1 shows the cross sections, branching fractions, 
initial numbers of events, and integrated luminosities of 
the simulated samples used in this analysis. 

B. Correction Factors 

We pass the simulated events through a GEANT-based 
model [49] of the DO detector. The simulated samples 
then have correction factors applied to ensure that 
the reconstruction and selection efficiencies match those 
found in data. Generally the efficiency to reconstruct, 
identify, and select objects in the simulated samples is 
higher than in data, so the following scale factors are 
used to correct for that difference: 

• Trigger efficiency correction factors 

The probability for each simulated event to fire 
the triggers detailed in Sec. IIIII is calculated as 
a weight applied to each object measured in the 
event. Electron and jet efficiencies, for all levels 
of the trigger architecture, are parametrized as 
functions of pt and ?7 dct . Muon efficiencies are 
parametrized as functions of r] det and (j>. These 
corrections are measured using data obtained with 
triggers different from those used in this search to 
avoid biases. 

• Electron identification efficiency correction 
factors 

We correct each simulated event in the electron 
channel with a factor that accounts for the 
differences in electron cluster finding identification, 
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TABLE II: Cross sections, branching fractions, initial numbers of events, and integrated 
luminosities of the simulated event samples. Here, I means e, n, and r. 



Statistics of the Simulated Samples 





Cross Section 


Branching 


Number 


Integrated 


Event Type 


[Pb] 


Fraction 


of Events 


Luminosity [fb _1 ] 


Signals 










tb — > e+jets 


0.88 + 0.14 


0.1111 + 0.0022 


92,620 


947 


tb — t /^+jets 


0.88 + 0.14 


0.1111 + 0.0022 


122,346 


1,251 


tb — > r+jets 


0.88 + 0.14 


0.1111 + 0.0022 


76,433 


782 


tqb — > e+jets 


1.98 + 0.30 


0.1111 + 0.0022 


130,068 


591 


tqb — > /i+jets 


1.98 + 0.30 


0.1111 + 0.0022 


137,824 


626 


tqb — > r+jets 


1.98 + 0.30 


0.1111 + 0.0022 


117,079 


532 


Signal total 


2.86 + 0.45 


0.3333 + 0.0067 


676,370 




Backgrounds 










tt — > ^+jets 


6.8 ± 1.2 


0.4444 ± 0.0089 


474,405 


157 




6.8+1.2 


0.1111 + 0.0089 


468,126 


620 


Top pairs total 


6.8 + 1.2 


0.5555 + 0.0111 


942,531 




-> ^66 


142 


0.3333 ± 0.0066 


1,335,146 


28 


Wee — * Ivcc 


583 


0.3333 ± 0.0066 


1,522,767 


8 


Wjj -> 


18, 734 


0.3333 ± 0.0066 


8,201,446 


1 


W+jets total 


19,459 


0.3333 + 0.0067 


11,059,359 





/em, and isolation efficiencies in the simulation and 
data. This correction factor is measured in Z^ee 
data and simulated events, and parametrized as a 
function of r] det . A second scale factor is applied 
to account for the differences between the data and 
the simulation in the Xc a i> track matching, and EM- 
likclihood efficiencies. This second scale factor is 
also derived from Z—>ee data and simulated events 
and parametrized as a function of ?y det and c^ det . 

• Muon identification and isolation efficiency 
correction factors 

We correct each simulated event in the muon 
channel for the muon identification, track match, 
and isolation efficiencies. The identification 
correction factor is parametrized as a function of 
?7 dot and (f>, track match as a function of track-z 
and ?y dct , and isolation as a function of the number 
of jets in the event. These corrections are measured 
in Z— data and simulated events. 

• Jet reconstruction efficiency and energy 
resolution correction factors 

Simulated jets need to be corrected for differences 
in the reconstruction and identification efficiency 
and for the worse energy resolution found in data 
than in the simulation. The jet energy scale 
correction is applied to the simulation as in the 
data, but then simulated jets are corrected for the 
jet reconstruction efficiency and smeared to match 
the jet energy resolution found in back-to-back 
photon+jet events. 



• Taggability and b-tagging efficiency 
correction factors 

In data, the taggability and 6-tagging requirements 
are applied directly, as described in Sec. IIVF1 For 
simulated samples, taggability-rate junctions and 
tag-rate functions arc applied instead of the direct 
selection because the modeling of the detector 
is not sufficiently accurate. The taggability-rate 
function is parametrized in jet pt, r/, and primary 
vertex z, and is measured in the selected data 
sample (Sec. IVI|) with one loose-isolated lepton. 
We check that the efficiency is the same as in the 
data sample with one tight-isolated lepton within 
the uncertainties. The average taggability for 
central high-py jets is around 90%. 

The 6-jet efficiency correction is measured in data 
using a muon-in-jet sample and a 6-jct enriched 
subset where one jet is required to have a small 
JLIP value, and in an admixture of Z^bb and tt 
simulated events where the 6-jets are required to 
contain a muon. The 6-tag efficiency correction for 
c-quark jets is derived in a combined MC sample 
with Z boson, multijets, and tt decays to c quarks, 
and assuming that the MC-to-data scale factor is 
the same as for the 6-jet efficiency The &-tag 
efficiency correction for light jets is derived from 
multijet data. All these 6-tagging corrections are 
parametrized as functions of the jet pt and r\. 
Figure [3] illustrates the tag-rate functions used in 
this analysis. 
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FIG. 3: The tag-rate functions (TRFs) used to weight the MC events according to the probability that they should be b tagged. 
In plots (a)-(d), the points show the neural network b tagging algorithm (the "tagger") applied directly to the MC events. The 
upper line that passes through the points is the result of the tag-rate functions, before scaling-to-data, being applied to the 
MC events to reproduce the result from the tagger. The lower line, with dotted error band, shows the tag-rate functions after 
they have been scaled to match the efficiency of the NN b tagging algorithm applied to data. In plot (e), the lines show the 
(scaled) tag-rate functions that are applied to MC events. 
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VI. EVENT SELECTION 
A. Selection Requirements 

We apply a loose event selection to find M^-likc 
events containing an isolated lepton, missing transverse 
energy, and two to four jets with high transverse 
momentum. The samples after this selection, which we 
call "pretagged," (i.e., before tagging has been applied), 
are dominated by VF+jets events, with some tt contri- 
bution that becomes more significant for higher jet 
multiplicities. The final selection improves the signal-to- 
background ratio significantly by requiring the presence 
of one or two 6-tagged jets. 

Common selections for both e and fj, channels 

• Good quality (for data with all subdetectors 
working properly) 

• Pass trigger: offline electrons and muons in the 
data are matched to the object that fired the 
appropriate trigger for that run period 

• Good primary vertex: \zpy\ < 60 cm with at least 
three tracks attached 

• Missing transverse energy: 15 < $t < 200 GcV 

• Two, three, or four jets with pj- > 15 GeV and 
h| < 3.4 

• Leading jet pr > 25 GeV and \rj\ < 2.5 

• Second leading jet pr > 20 GeV 

• Jet triangle cut |A</>(leading jet, $t)\ vs. $t (see 
Fig. 8 in Ref. [23[ for a pictorial view of these cuts): 

\A4>\ < 1.5 + (tt - 1.5)#r(GeV)/35 rad 

• One or two 6-tagged jets 

Electron channel selection 

• Only one tight electron with px > 15 GeV and 
|?7 dot | < 1.1 

• No tight muon with p T > 18 GcV and |?7 dct | < 2.0 

• No second loose electron with pr > 15 GeV and 

any r] 

• Electron coming from the primary vertex: 
\Az(e,PV)\ < 1 cm 

• Electron triangle cuts |A^(e,^r)| vs. $t (see 
Fig. 8 in Ref. (23|): 

1. \A4>(e,$ T )\ > 2 - 2$ T (GeV)/40 rad 

2. |A<£(e,#r)| > 1.5 - 1.5#r(GeV)/50 rad 

3. |A0(e,#r)| < 2 + (tt - 2)$ T (GcV)/24 rad 

Muon channel selection 

• Only one tight muon with pt > 18 GeV and 
|7? dot | < 2.0 

• No tight electron with px > 15 GcV and |?y det | < 
2.5 

• Muon coming from the primary vertex: 
|Az(^,PV)| < 1 cm 



• Muon triangle cuts |A0(/i,^?t)| vs. $t (see Fig. 8 
in Ref. [H): 

1. |A0O,# T )| > 1.1 - l.l# T (GeV)/80 rad 

2. |A^O,#r)| > 1.5- 1.5#r(GeV)/50 rad 

3. |A<£0,#r)| < 2.5 + (7r-2.5)$ T (GeV)/30 rad 

Some of the selection criteria listed above are designed 
to remove areas of the data that are difficult to model. 
In particular, the upper J£t selection gets rid of a few 
events where the muon px fluctuated to a large value. 
The "triangle cuts" are very efficient in removing multijet 
events where a misreconstructcd jet creates fake missing 
energy aligned or anti-aligned in azimuth with the lepton 
or jet. 

Background-data selection for measuring the 
multijet background 

• All the same selection criteria as listed above except 
for the tight lepton requirements 

• Electron channel — only one loose-but-not-tight 
electron 

• Muon channel - - only one loose-but-not-tight 
muon 

The definitions of loose and tight electrons and muons 
arc in Sees. IjVEfl an d llVCl 



B. Numbers of Events After Selection 

Tabic Mil shows the numbers of events in the signal 
and background samples and in the data after applying 
the selection criteria. Note that these numbers are just 
counts of events used later in the analysis, and not signal 
or background yields after normalizations and corrections 
have been applied. 



VII. BACKGROUND MODEL 

A. W+Jets and Multijets Backgrounds 

The W+jcts background is modeled using the parton- 
jet matched ALPGEN simulated samples described in 
Sec. El This background is normalized to data before 
b tagging, using a procedure explained below. Because 
we normalize to data and do not use theory cross 
sections, small components of the total background from 
iT+jets and diboson processes (WW, WZ, and ZZ, 
which amount to less than 4% of the total background 
expectation after tagging) are implicitly included in the 
W+jets part of the background model. This simplifi- 
cation does not affect the final results because of the low 
rate from these processes in the final selected dataset, and 
because the kinematics of the events are similar to those 
in W+jets events. They arc thus identified together with 
VF+jets events by the multivariate discriminants. 



14 



TABLE III: Numbers of events for the electron and muon channels after selection. The MC samples include events 
coming from r decays, r — > tv where I — e in the electron channel and £ — fi in the muon channel. 



Numbers of Events After Selection 







Electron Channel 






Muon Channel 






1 jet 


2 jets 


3 jets 


4 jets 


> 5 jets 


1 jet 


2 jets 


3 jets 


4 jets 


5 jets 


Signal MC 






















tb 


6,908 


19,465 


9,127 


2,483 


595 


3,878 


12,852 


6,458 


1,809 


401 


tqb 


8,971 


22,758 


12,080 


3,797 


1,092 


8,195 


21,066 


11,193 


3,489 


835 


Background MC 
























7,671 


29,537 


26,042 


12,068 


5,396 


5,509 


24,595 


21,803 


9,788 


3,442 


ti-^l+jets 


522 


5,659 


22,477 


27,319 


14,298 


232 


3,376 


16,293 


22,680 


8,658 


Wbb 


26,611 


13,914 


9,011 


3,848 


1,434 


27,764 


14,488 


9,427 


3,874 


1,204 


Wcc 


21,765 


13,453 


7,562 


2,252 


591 


32,712 


19,047 


10,141 


3,051 


663 


Wjj 


134,660 


61,497 


34,162 


8,290 


1,750 


147,842 


66,201 


36,673 


9,169 


1,502 


Pretag data 






















Multijets 


11,565 


6,993 


4,043 


1,317 


431 


897 


658 


462 


151 


48 


Signal data 


27,370 


8,220 


3,075 


874 


223 


17,816 


6,432 


2,590 


727 


173 


One-tag data 






















Multijets 


246 


322 


226 


93 


34 


31 


51 


49 


21 


8 


Signal data 


445 


357 


207 


97 


35 


289 


287 


179 


100 


38 


Two-tags data 






















Multijets 




12 


15 


14 


7 




3 


4 


1 


4 


Signal data 




30 


37 


22 


10 




23 


32 


27 


10 



The multijet background is modeled using datasets 
that contain misidcntified lcptons, as described at the 
end of Sec. IVI Al These datasets provide the shape 
for the multijet background component in each analysis 
channel. They are normalized to data as part of the 
W+jets normalization process. 

We normalize the W+jets and multijet backgrounds 
to data before tagging using the matrix method [5fjl |. 
which lets us estimate how many events in the pretagged 
samples contain a misidentifred lepton (originating from 
multijet production) and how many events have a real 
isolated lepton (originating from W^+jets or tt). Two 
data samples are defined, the tight sample, which is the 
signal sample after all selection cuts have been applied, 
and the loose sample, where the same selection has been 
applied but requiring only loose lepton quality. The 
tight data sample, with -/V t i g ht events, is a subset of the 
loose data sample with Moose events. The loose sample 
contains Mooie^ events with a real lepton (signal-like 
events, mostly W+jets and tt) and M f oose^ fake lepton 
events, which is the number of multijet events in the loose 
sample. 

We measure the probability £ roa i-£ for a real isolated 
lepton to pass the tight lepton selection in Z — > 
M data events. The probability for a fake-isolated 
lepton to pass the tight-isolated lepton criteria, £f a ke-^ 
is measured in a sample enriched in multijet events 
with the same selection as the signal data but 



requiring $t < 10 GcV. In the electron channel, 
these probabilities are parametrized as £ rC ai-c(PT, V) 
and £fake-e(^Vjets, trigger period). In the muon channel, 
they are parametrized as £real-^(Mj e ts,f>T) and £f a ke- M (?7)- 
With these definitions, the matrix method is applied 
using the following two equations: 



Moose — ^loose + Moose (3) 
Vtight - Might + Might 

= £fakc-£ M f ooso £ + EreaW MoTse i ( 4 ) 

and solving for N^°~ e and so that the multijet 

and the W-like contributions in the tight sample 
and Might 1 can b e determined. 

The results of the matrix method normalization, which 
we apply separately in each jet multiplicity bin, are 
shown in Table HVl The values shown for £ rca i-« and 
£fakc-£ are averages for illustration only. The pretagged 
background-data sample is scaled to Might~^' an d the 
W+jets simulated samples (Wbb+Wcc+Wjj) are scaled 
to Might^' after subtracting the expected number of tt 
events in each jet multiplicity bin of the tight sample. 
These normalization factors are illustrated in Fig. 01 
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TABLE IV: Matrix method normalization values in the electron and muon channels for the loose and tight selected 
samples, and the expected contribution from multijet and WMike events. 



Normalization of W+Jets and Multijets Backgrounds to Data 







Electron Channel 








Muon Channel 






1 jet 


2 jets 


3 jets 


4 jets 


> 5 jets 


1 jet 


2 jets 


3 jets 


4 jets 


5 jets 




38,935 


15,213 


7,118 


2,191 


654 


18,714 


7,092 


3,054 


878 


221 


Might 


27,370 


8,220 


3,075 


874 


223 


17,816 


6,432 


2,590 


727 


173 


Ercal-f 


0.873 


0.874 


0.874 


0.875 


0.875 


0.991 


0.989 


0.987 


0.961 


0.878 


£fakc-£ 


0.177 


0.193 


0.188 


0.173 


0.173 


0.408 


0.358 


0.342 


0.309 


0.253 


a rtakc-i 
JV tight 


1,691 


1,433 


860 


256 


86 


498 


329 


223 


56 


10 


A TTCsd-£ 

iv tight 


25,679 


6,787 


2,215 


618 


137 


17,319 


6,105 


2,369 


669 


162 



CO 

% 2 

f 1.8 

■S 1.6 

CD 
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S 0.8 
"T 



D0 
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■sXJ sX. 
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FIG. 4: The factors used to normalize the VF+jets background 
model to pretagged data in each analysis channel. 



B. Top-Quark Pairs Background 

Background from the tt process is modeled using the 
parton-jet matched ALPGEN simulated samples described 
in Sec.|V] These events are normalized to the theoretical 
cross section Q at TO top = 175 GeV (chosen to match the 
value used to generate the samples), which is 6.8 pb. 



VIII. SIGNAL ACCEPTANCES 

Table [V] shows the percentage of each signal that 
remains after selection. We achieve roughly 30% higher 
acceptances in this anal ysis compared to our previously 
published analysis [22l , l23j ] from the use of the more 
efficient neural network 6-tagging algorithm. The total 
acceptance for the s-channel tb process is (3.2 ± 0.4)% 
and for the t-channel tqb process it is (2.1 ± 0.3)%. 



the .9 fb" 1 of data analyzed here. Tables ED EEC and 
IVIIII show these yields for all signals and backgrounds 
separated by lepton flavor and jet multiplicity within 
each table, and by the numbers of 6-tagged jets 
between the tables. Because the W^+jets and multijet 
backgrounds are normalized to data before tagging, the 
sum of the backgrounds is constrained to equal the 
number of events observed in the data, as seen in the 
first table. The yield values shown in these tables have 
been rounded to integers for clarity, so that the sums of 
the components will not always equal exactly the values 
given for these sums. All calculations however have been 
done with full-precision values. 

Only events with two, three and four jets are used 
in this analysis, but we show the acceptances and the 
yields for events with one and for five or more jets 
in these tables to demonstrate the consistency of the 
analysis in those channels. Tables lVIll and lVlIII show that 
most of the signal is contained in the two and three jet 
bins. However, as discussed in Sec. lXIX Al our maximum 
predicted sensitivity is obtained by including events with 

2- 4 jets. 

Table IIXI summarizes the signals, summed 
backgrounds, and data for each channel, showing 
the uncertainties on the signals and backgrounds, and 
the signal-to-background ratios. Table [X] shows the 
signal and background yields summed over electron and 
muon channels and 1- and 2-tagged jets in the 2-jet, 

3- jet, and 4-jet bins, and for the 2-, 3-, and 4-jet bins 
combined. 

Some basic kinematic distributions are shown for 
electron channel events in Fig. [5] and for muon channel 
events in Fig. [5J Since the yields are normalized before 
b tagging, in each case the pretagged distributions are 
shown in the first row of distributions and the one-tag 
distributions arc shown in the second row. 



IX. EVENT YIELDS 



We use the term "yield" to mean the number of events 
of the signal or background in question predicted to be in 
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TABLE V: Signal acceptances after selection. 



Signal Acceptances 







Electron Channel 






Muon Channel 






1 jet 


2 jets 


3 jets 


4 jets 


> 5 jets 


1 jet 


2 jets 


3 jets 


4 jets 


5 jets 


Before tagging 






















tb 


0.55% 


1.77% 


0.83% 


0.23% 


0.06% 


0.33% 


1.36% 


0.69% 


0.19% 


0.05% 


tqb 


0.52% 


1.49% 


0.79% 


0.25% 


0.07% 


0.36% 


1.17% 


0.64% 


0.20% 


0.05% 


One-tag 






















tb 


0.24% 


0.82% 


0.39% 


0.11% 


0.03% 


0.15% 


0.64% 


0.32% 


0.09% 


0.02% 


tqb 


0.18% 


0.61% 


0.34% 


0.11% 


0.03% 


0.13% 


0.50% 


0.28% 


0.09% 


0.02% 


Two-tags 






















tb 




0.29% 


0.14% 


0.04% 


0.02% 




0.24% 


0.12% 


0.03% 


0.01% 


tqb 




0.02% 


0.05% 


0.02% 


0.01% 




0.01% 


0.04% 


0.02% 


0.01% 



TABLE VI: Predicted yields after selection and before 6 tagging. 



Yields Before b- Tagging 







Electron Channel 






Muon Channel 






1 jet 


2 jets 


3 jets 


4 jets 


> 5 jets 


1 jet 


2 jets 


3 jets 


4 jets 


5 jets 


Signals 






















tb 


4 


14 


7 


2 





3 


10 


5 


1 





tqb 


9 


27 


14 


5 


1 


6 


20 


11 


3 


1 


Backgrounds 






















ti^U 


9 


35 


28 


10 


4 


5 


27 


22 


8 


3 


tt-^+jets 


2 


20 


103 


128 


67 


1 


14 


71 


99 


43 


Wbb 


659 


358 


149 


42 


5 


431 


312 


161 


47 


10 


Wcc 


1,592 


931 


389 


93 


10 


1,405 


1,028 


523 


131 


21 


Wjj 


23,417 


5,437 


1,546 


343 


51 


15,476 


4,723 


1,591 


385 


85 


Multijets 


1,691 


1,433 


860 


256 


86 


498 


329 


223 


58 


10 


Background Sum 


27,370 


8,220 


3,075 


874 


223 


17,816 


6,434 


2,592 


727 


172 


Data 


27,370 


8,220 


3,075 


874 


223 


17,816 


6,432 


2,590 


727 


173 
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TABLE VII: Predicted yields after selection for events with exactly one 6-tagged jet. 



Yields With One 6-Tagged Jet 







Electron Channel 






Muon Channel 






1 jet 


2 jets 


3 jets 


4 jets 


> 5 jets 


1 jet 


2 jets 


3 jets 


4 jets 


5 jets 


Signals 






















tb 


2 


7 


3 


1 





1 


5 


2 


1 





tqb 


3 


11 


6 


2 


1 


2 


9 


5 


2 





Backgrounds 






















it-*e£ 


4 


16 


13 


5 


2 


2 


13 


10 


4 


1 


tt-^+jets 


1 


11 


47 


58 


30 





6 


32 


45 


20 


Wbb 


188 


120 


50 


14 


2 


131 


110 


56 


16 


4 


Wcc 


81 


74 


36 


9 


1 


64 


74 


46 


13 


2 


Wjj 


175 


61 


20 


5 


1 


125 


58 


23 


6 


2 


Multijets 


36 


66 


48 


18 


7 


17 


26 


24 


8 


2 


Background Sum 


484 


348 


213 


110 


43 


340 


286 


191 


93 


30 


Data 


445 


357 


207 


97 


35 


289 


287 


179 


100 
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TABLE VIII: Predicted yields after selection for events with exactly two fr-tagged jets. 



Yields With Two b-Tagged Jets 





1 jet 


Electron Channel 
2 jets 3 jets 4 jets 


> 5 jets 


1 jet 


Muon Channel 
2 jets 3 jets 4 jets 


5 jets 


Signals 






















tb 




2.3 


1.1 


0.3 


0.1 




1.9 


0.9 


0.3 


0.1 


tqb 




0.3 


0.8 


0.4 


0.2 




0.2 


0.7 


0.4 


0.1 


Backgrounds 






















tt-^U 




5.5 


4.6 


1.7 


0.7 




4.6 


3.8 


1.4 


0.5 


ti-^i+jets 




1.7 


13.6 


21.8 


11.7 




1.0 


10.2 


18.0 


8.1 


Wbb 




16.2 


0.8 


1.8 


0.3 




15.3 


8.2 


2.3 


0.6 


Wcc 




1.6 


1.1 


0.4 


0.1 




1.6 


1.5 


0.5 


0.1 


Wjj 




0.1 


0.1 


0.0 


0.0 




0.1 


0.1 


0.0 


0.0 


Multijets 




2.5 


3.2 


2.7 


1.4 




1.5 


1.9 


0.4 


0.8 


Background Sum 




27.5 


29.4 


28.4 


14.2 




24.1 


25.7 


22.7 


10.1 


Data 




30 


37 


22 


10 




23 


32 


27 


10 



18 



TABLE IX: Summed signal and background yields after selection with total uncertainties, the numbers of data 
events, and the signal-to-background ratio in each analysis channel. Note that the signal includes both s-channel and 
t-channel single top quark processes. 



Summary of Yields with Uncertainties 







Electron Channel 






Muon Channel 






1 jet 


2 jets 


3 jets 


4 jets 


1 jet 


2 jets 


3 jets 


4 jets 


Zero-tag 


















Signal Sum 


9± 2 


21 ±4 


10 ±2 


3± 1 


5± 1 


15 ±3 


7± 2 


3± 1 


Bkgd Sum 


26, 886 ± 626 7, 845 ± 336 2, 832 ± 144 


735 ± 60 


17, 476 ± 515 6, 124 ± 351 2, 375 ± 178 


610 ±50 


Data 


29,925 


7,833 


2,831 


752 


17,527 


6,122 


2,378 


599 


SignakBkgd 


1:3,104 


1:378 


1:286 


1:259 


1:3,253 


1:407 


1:320 


1:292 


One-tag 


















Signal Sum 


5± 1 


18 ±3 


9±2 


3± 1 


3± 1 


14 ± 3 


7± 2 


2± 1 


Bkgd Sum 


484 ± 86 


348 ± 61 


213 ± 30 


110 ± 16 


340 ± 63 


286 ± 58 


191 ± 34 


93 ± 15 


Data 


445 


357 


207 


97 


289 


287 


179 


100 


SignakBkgd 


1:95 


1:20 


1:23 


1:38 


1:101 


1:21 


1:26 


1:42 


Two-tags 


















Signal Sum 




2.6 ±0.6 


1.9 ±0.4 


0.7 ±0.2 




2.1 ±0.5 


1.6 ±0.4 


0.6 ±0.2 


Bkgd Sum 




27.5 ±6.5 


29.4 ±5.7 


28.4 ±6.0 




24.1 ± 6.1 


25.7 ±5.5 


22.7 ± 5.4 


Data 




30 


37 


22 




23 


32 


27 


SignakBkgd 




1:10 


1:15 


1:39 




1:12 


1:16 


1:37 



TABLE X: Yields after selection for the analysis channels combined. 



Summed Yields 







e+fi -+ 


1±2 tags 






2 jets 


3 jets 


4 jets 


2,3,4 jets 


Signals 










tb 


16±3 


8±2 


2±1 


25±6 


tqb 


20±4 


12±3 


4±1 


37±8 


Backgrounds 










tt-^li 


39±9 


32±7 


11±3 


82±19 


tt-^£±jets 


20±5 


103±25 


143±33 


266±63 


Wbb 


261±55 


120±24 


35±7 


416±87 


Wcc 


151±31 


85±17 


23±5 


259±53 


Wjj 


119±25 


43±9 


12±2 


174±36 


Multijets 


95±19 


77±15 


29±6 


202±39 


Background Sum 


686±131 


460±75 


253±42 


1,398±248 


Backgrounds + S ignals 


721±132 


480±76 


260±43 


1,461±251 


Data 


697 


455 


246 


1,398 
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FIG. 5: The first row shows pretagged distributions for the pr of the electron, the pr of the leading jet, and the reconstructed 
W boson transverse mass. The second row shows the same distributions after tagging for events with exactly one 6-tagged jet. 
The hatched area is the ±lcr uncertainty on the total background prediction. 
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FIG. 6: The first row shows pretagged distributions for the pr of the muon, the pr of the leading jet, and the reconstructed 
W boson transverse mass. The second row shows the same distributions after tagging for events with exactly one 6-tagged jet. 
The hatched area is the ±lcr uncertainty on the total background prediction. 
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X. SYSTEMATIC UNCERTAINTIES 

We consider several sources of systematic uncertainty 
in this analysis and propagate them separately for 
each signal and background source throughout the 
calculation. Systematic uncertainties enter the analysis 
in two ways: as uncertainty on the normalization of 
the background samples and as effects that change 
the shapes of distributions for the backgrounds and 
the expected signals. The effect of these uncertainties 
on the discriminant outputs and how they affect the 
cross section measurement is described in Sec. IXVIII Bl 
Table IXfl summarizes the relative uncertainties on each 
of the sources described below. 

The first uncertainties listed here affect only the ti 
background normalization. 

• Integrated luminosity 

At 6.1% [5l|, this is a small contribution to the ti 
yield uncertainty. 

• Theoretical cross section 

The uncertainty on the ti cross section includes 
components for the choice of scale and PDF, and 
also, more significantly, a large component from the 
top quark mass uncertainty (i.e., using 175 GeV in 
this analysis when the latest world average value is 
170. 9 ± 1.8 GeV). The combined uncertainty on the 
cross section is taken as 18%. 

The following uncertainties arise from the correction 
factors and functions applied to the simulated samples to 
make them match data, and thus affect both the signal 
acceptances and the ti background yield. 

• Trigger efficiency 

Functions that represent the trigger efficiency for 
each object type and trigger level as a function 
of px, r] dct , and <j> are used to weight simulated 
events. The functions are shifted up and down 
by one standard deviation of the statistical error 
arising from the data samples used to calculate 
the functions and the weight of each event is 
recalculated. Fixed uncertainties of 3% in the 
electron channel and 6% in the muon channel are 
chosen since they encompass all the small variations 
seen in each analysis channel. 

• Primary vertex selection efficiency 

The primary vertex selection efficiency in data 
and the simulation arc not the same. We assign 
a systematic uncertainty of 3% for the difference 
between the beam profile along the longitudinal 
direction in data and the simulated distribution. 

• Electron reconstruction and basic identifi- 
cation efficiency 

The electron reconstruction and basic identification 
correction factors are parametrized as a function of 
r/ dot . The 2% uncertainty in the efficiency accounts 
for its dependence on variables other than r) dct , 



and as a result of limited data statistics used to 
determine the correction factors. 

• Electron shower shape, track match, and 
likelihood efficiency 

The electron shower shape, track match, and 
likelihood correction factors are parametrized as 
a function of rf ct and 4>. The 5% uncertainty in 
the efficiency accounts for the dependence on other 
variables, such as the number of jets and the instan- 
taneous luminosity, and as a result of limited data 
statistics in determining these correction factors. 

• Muon reconstruction and identification 
efficiency 

The correction factor uncertainty of 7% includes 
contributions from the method used to determine 
the correction functions, from the background 
subtraction, and from the limited statistics in the 
paramctrization as a function of the ?7 dct and <j> of 
the muon. 

• Muon track matching and isolation 

The muon tracking correction functions have an 
uncertainty that includes contributions from the 
method used to measure the functions, from the 
background subtraction, luminosity and timing 
bias, and from averaging over <j> and the limited 
statistics in each bin used to calculate the functions. 
The muon isolation correction uncertainty is 
estimated based on its dependence on the number 
of jets, and covers the dependences not taken into 
account such as pr and ?7 det . The overall value of 
these uncertainties combined is 2%. 

• Jet fragmentation 

This systematic uncertainty covers the lack of 
certainty in the jet fragmentation model (and 
is measured as the difference between PYTHIA 
and herwig fragmentation) as well as the 
uncertainty in the modeling of initial-state and 
final-state radiation. It is 5% for tb and tqb and 
7% for ti. 

• Jet reconstruction and identification 

The efficiency to reconstruct jets is similar in 
data and simulated events, but the efficiency of 
the simulated jets is nevertheless corrected by a 
paramctrization of this discrepancy as a function of 
jet px- We assign a 2% error to the paramctrization 
based on the statistics of the data sample. 

• Jet energy scale and jet energy resolution 

The jet energy scale (JES) is raised and lowered 
by one standard deviation of the uncertainty on it 
and the whole analysis repeated. In the data, the 
JES uncertainty contains the jet energy resolution 
uncertainty. But in the simulation, the jet energy 
resolution uncertainty is not taken into account 
in the JES uncertainty. To account for this, the 
energy smearing in the simulated samples is varied 
by the size of the jet energy resolution. This 
uncertainty affects the acceptance and the shapes 
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of the distributions. The value of this uncertainty 
varies from 1% to 20%, depending on the analysis 
channel, with typical values between 6% and 10%. 

The uncertainty on the W+jets and multijets 
background yields comes from the normalization to 
data. The T^+jets yield is 100% anticorrelated with the 
multijets yield. 

• Matrix-method normalization 

The determination of the number of real-lepton 
events in data is affected by the uncertainties 
associated with the determination of the 
probabilities for a loose lepton to be (mis)idcntificd 
as a (fake) real lepton, £f a ke-f and £ r cai-£- The 
normalization is also affected by the limited 
statistics of the data sample as described in 
Sec. IVII Al The combined uncertainties on the 
IT+jets and multijets yields vary between 17% 
and 28%, depending on the analysis channel. 

• Heavy flavor ratio 

The uncertainty on the scale factor applied to 
set the Wbb and Wcc fractions of the VF+jets 
sample, as described in Sec. [Vj is estimated to 
cover several effects: dependence on the 6-quarkpr, 
the difference between the zero-tag samples where 
it is estimated and the signal samples where it is 
used, and the intrinsic uncertainty on the value of 
the LO cross section it is being applied to. This 
uncertainty is 30%. It is included in the matrix 
method uncertainty described above. 

There is one source of uncertainty that affects 
the signal acceptances, and both the ti and W^+jets 
background yields. 

• b-tag modeling 

The uncertainty associated with the taggability- 
rate and tag-rate functions is evaluated by raising 
and lowering the tag rate by one standard deviation 
separately for both the taggability and the tag rate 
components and determining the new event tagging 
weight. These uncertainties originate from several 
sources as follows: statistics of the simulated event 
sets; the assumed fraction of heavy flavor in the 
simulated multijet sample used for the mistag rate 
determination; and the choice of parametrizations. 
The 6-tag modeling uncertainty varies from 2% to 
16%, depending on the analysis channel, and we 
include the variation on distribution shapes, as well 
as on sample normalization. 



XI. MULTIVARIATE ANALYSES 

The search for single top quark production is 
significantly more challenging than the search for ti 



TABLE XI: Summary of the relative systematic 
uncertainties. The ranges shown represent the 
different samples and channels. 

Relative Systematic Uncertainties 



Integrated luminosity 


6% 


ti cross section 


18% 


Electron trigger 


3% 


Muon trigger 


6% 


Primary vertex 


3% 


Electron reconstruction & identification 


2% 


Electron track match & likelihood 


5% 


Muon reconstruction & identification 


7% 


Muon track match & isolation 


2% 


Jet fragmentation 


(5-7)% 


Jet reconstruction and identification 


2% 


Jet energy scale 


(1-20)% 


Tag-rate functions 


(2-16)% 


Matrix-method normalization 


(17-28)% 


Heavy flavor ratio 


30% 


£real-e 


2% 


Sresl-fi 


2% 


^fake-e 


(3-40)% 


Sf'ake-^i 


(2-15)% 



production. The principal reasons are the smaller signal- 
to-background ratio for single top quarks and the large 
overlap between the signal distributions and those of the 
backgrounds. We therefore concluded from the outset 
that optimal signal-background discrimination would be 
necessary to have any chance of extracting a single top 
quark signal from the available dataset. 

Optimal event discrimination is a well-defined problem 
with a well-defined and unique solution. Given the 
probability 



p(S|x) 



p(x\S)p(S) 



p(x\S)p(S)+p(x\B)p(B) 



(5) 



that an event described by the variables x is of the signal 
class, S, the signal can be extracted optimally, that is, 
with the smallest possible uncertainty [531 ] . by weighting 
events with p(S'|x), or, as we have done, by fitting the 
sum of distributions of p(S\x) for signal and background 
to data, as described in Sec. IXVIIH In practice, since 
any one-to-one function of p(5|x) is equivalent to p(S\~x), 
it is sufficient to construct an approximation to the 
discriminant 



D(x) = 



p(x|5) 



p(x|5) +p(x|B) 



(0) 



built using equal numbers of signal and background 
events, that is, with p(S) = p(B). Each of the three 
analyses we have undertaken is based on a different 
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numerical method to approximate the discriminant -D(x). 
From this perspective, they are conceptually identical. 

In this paper, we present results from three different 
multivariate techniques applied to the selected dataset: 
boosted decision trees (DT) in Sec. IXII1 Bayesian neural 
networks (BNN) in Sec. IXIII1 and matrix elements 
(ME) in Sec. IXIVI The DT analysis approximates 
the discriminant -D(x) using an average of many piece- 
wise approximations to -D(x). The BNN analysis uses 
nonlinear functions that approximate -D(x) directly, that 
is, without first approximating the densities p(pc\S) 
and p(x\B). The ME method approximates the 
densities p(x\S) and p(x\B) semi-analytically, starting 
with leading-order matrix elements, and computes -D(x) 
from them. 

The three analyses also differ by the choice of variables 
used. The basic observables arc: 

1 . missing transverse energy 2-vector (J£t , 4>) , 

2. lepton 4- vector (£71,77, </>), assuming massless 
leptons, 

3. jet 4-vector (Er , rj , </>) , assuming massless jets, and 
jet-type, that is, whether it is a b jet or not, for 
each jet. 

These, essentially, are the observables used in the matrix 
element analysis. The other analyses, however, make use 
of physically motivated variables [54 , l55| derived from 
the fundamental observables. Of course, the derived 
variables contain no more information than is contained 
in the original degrees of freedom. However, for some 
numerical approximation methods, it may prove easier 
to construct an accurate approximation to -D(x) if it is 
built using carefully chosen derived variables than one 
constructed directly in terms of the underlying degrees 
of freedom. It may also happen that a set of judiciously 
chosen derived variables, perhaps one larger than the 
set of fundamental observables, yields better performing 
discriminants simply because the numerical approxi- 
mation algorithm is better behaved or converges faster. 

The complete set of variables used in the DT and BNN 
analyses is shown in Tabic IXIII Jets are sorted in pt 
and index 1 refers to the leading jet in a jet category: 
"jetn" (n=l,2,3,4) corresponds to each jet in the event, 
"tagn" to ^-tagged jets, "untagn" to non-6-tagged jets, 
and "notbestn" to all but the best jet. The "best" jet is 
defined as the one for which the invariant mass M(W,jei) 
is closest to m t0 p = 175 GeV. 

Aplanarity, sphericity, and centrality are variables that 
describe the direction and shape of the momentum flow 
in the events [56| . The variable H is the scalar sum of 
the energy in an event for the jets as shown. Ht is the 
scalar sum of the transverse energy of the objects in the 
event. M is the invariant mass of various combinations 
of objects. Mt is the transverse mass of those objects. 
Q is the charge of the electron or muon. 

A selection of these variables is shown in Figs. [8j 
and [5] for the sum of all channels: electron plus 



muon channels, two to four jets, and one or two b- 
tagged jets. Figure [TU] shows distributions for some of 
the variables from Table IXIII for SM signals and the 
background components, normalized to unit area, so that 
the differences in shapes may be seen. 

TABLE XII: Variables used with the decision trees and 
Bayesian neural networks analyses, in three categories: object 
kinematics; event kinematics; and angular variables. For the 
angular variables, the subscript indicates the reference frame. 
★ indicates variables that were only used for the DT analysis, 
t indicates variables only used by the BNN analysis. 



DT and BNN Input Variables 



Object Kinematics 


Event Kinematics 


p T (jetl) 


Aplanarity ( alljets, W) 


p T (jet2) 


Sphericity(alljets,IF) 


p T (jet3) 


Centrality (allj ets) ' 


p T (jet4) 




p T (best) 


//(alljets) 1 


Pr(notbestl)* 


ff(jetl,jet2)t 


p:r(notbest2)* 


flr(alljets) 


Pr(tagl) 


_Hr (alljets— best)* 


p T (untagl) 


_f/r(alljets— tagl)* 


p T (untag2) 


flr(alljets,WO 


Pt{1) ] 


# T (jetl,jet2) 




H T (jetl,jet2,W0 


Angular Variables 


M(alljets) 


cos(jetl,alljets)aiijct s 


M ( allj ets — b est ) * 


cos(jet2,alljets) a iij ots 


M(alljets-tagl)* 


cos(notbestl,alljets) a iij Cts 


M(jetl,jet2) 


cos(tagl,alljets)* Ujets 


M(jetl,jet2,W) 


cos(untagl,alljets)aiij C t s 


M(W,best) 


cos(best,notbestl) bc sttop 


(i.e., "best" m top ) 


cos(best,£)bcstto P 


Af(W,tagl) 


cos(notbestl,£)bestto P 


(i.e., "6-tagged" m toI 


cos(Q(^) x 5(^)/)bostto P 


M T (jetl,jet2) 


COs(beSttOpCMframo , ^besttop ) 


PT (wy 


COs(jetl,^)btaggcdtop 


Pt (alljets— best) 


COs(jet2,^)btaggcdtop 


Pt (alljets— tagl) 


COs(tagl/)btaggedtop 


p T (jetljet2) 


cos(untagl ,£) btaggcdtop 


M T (W) 


COs(btaggedtOp C Mframo , -^btaggcdtop ) 


Q(£)xr?(untagl) 


cosQetlA 1 ^ 




cos(jet2/) 1 t ab 




cos(best,£) 1 t ab 




cos(tagl,^) 1 t ab 




ft(jetl,jet2)* 
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FIG. 7: Comparison of SM signal, backgrounds, and data after selection and requiring at least one b-tagged jet for six 
discriminating individual object variables. Electron and muon channels are combined. The plots show (a) the lepton transverse 
momentum and (b) pseudorapidity, (c) the leading jet transverse momentum and (d) pseudorapidity, (e) the second leading jet 
transverse momentum and (f) pseudorapidity. The hatched area is the ±1<j uncertainty on the total background prediction. 
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FIG. 8: Comparison of SM signal, backgrounds, and data after selection and requiring at least one 6-tagged jet for six 
discriminating event kinematic variables. Electron and muon channels are combined. Shown are (a) the invariant transverse 
mass of the reconstructed W boson, (b) the invariant mass of the b-tagged jet and the W boson, (c) the invariant mass of 
all jets, (d) the missing transverse energy, (e) the scalar sum of the transverse momenta of jets, lepton and neutrino, (f) the 
invariant mass of all final state objects. The hatched area is the ±1<t uncertainty on the total background prediction. 
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FIG. 9: Comparison of SM signal, backgrounds, and data after selection and requiring at least one b-tagged jet for three 
discriminating angular correlation variables. Electron and muon channels are combined. Shown are (a) the angular separation 
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FIG. 10: Shape comparison between the s- and t-channel signals and the backgrounds in the most discriminating variables for 
the decision tree analysis chosen from the e+2jets/ltag channel. Shown are (a) the invariant mass of all jets, (b) the invariant 
mass of the 6-tagged jet and the W boson, (c) the invariant mass of the two leading jets and the W boson, (d) the cosine of 
the fe-tagged jet and the lepton in the reconstructed 6-tagged top quark rest frame, (e) the charge of the lepton multiplied by 
the pseudorapidity of the leading untagged jet, (f) the scalar sum of the transverse momenta of all the jets, (g) the invariant 
mass of all jets minus the best jet, (h) the invariant transverse mass of the reconstructed W boson, and (i) the cosine of the 
angle between the reconstructed b-tagged top quark in the center-of-mass rest frame and the lepton in the 6-tagged top quark 
rest frame. All histograms are normalized to unit area. 
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XII. BOOSTED DECISION TREES ANALYSIS 

A decision tree [57l . [58| employs a machine- learning 
technique that effectively extends a simple cut-based 
analysis into a multivariate algorithm with a continuous 
discriminant output. Boosting is a process that can 
be used on any weak classifier (defined as any classifier 
that does a little better than random guessing). In this 
analysis, we apply the boosting procedure to decision 
trees in order to enhance separation of signal and 
background. 



A. Decision Tree Algorithm 

A decision tree classifies events based on a set of 
cumulative selection criteria (cuts) that define several 
disjoint subsets of events, each with a different signal 
purity. The decision tree is built by creating two branches 
at every nonterminal node, i.e., splitting the sample 
of events under consideration into two subsets based 
on the most discriminating selection criterion for that 
sample. Terminal nodes are called leaves and each leaf 
has an assigned purity value p. A simple decision tree 
is illustrated in Fig. [11] An event defined by variables x 
will follow a unique path through the decision tree and 
end up in a leaf. The associated purity p of this leaf 
is the decision tree discriminant output for the event: 
.D(x) = p, with D(x) given in Eq. [6l 




FIG. 11: Graphical representation of a decision tree. Nodes 
with their associated splitting test are shown as (blue) circles 
and terminal nodes with their purity values are shown as 
(green) leaves. An event defined by variables x, of which 
H T < 242 GeV and m top > 162 GeV will return D(xi) = 0.82, 
and an event with variables Xj of which Ht > 242 GeV and 
p T > 27.6 GeV will have £>(xj) = 0.12. All nodes continue to 
be split until they become leaves, (color online) 



One of the primary advantages of decision trees 
over a cut-based analysis is that events which fail 
an individual cut continue to be considered by the 
algorithm. Limitations of decision trees include the 
instability of the tree structure with respect to the 
training sample composition, and the piecewise nature of 
the output. Training on different samples may produce 
very different trees with similar separation power. The 
discrete output comes from the fact that the only possible 
values are the purities of each leaf and the number of 
leaves is finite. 

Decision tree techniques have interesting features, 
as follows: the tree has a human-readable structure, 
making it possible to know why a particular event was 
labeled signal or background; training is fast compared to 
neural networks; decision trees can use discrete variables 
directly; and no preprocessing of input variables is 
necessary. In addition, decision trees are relatively 
insensitive to extra variables: unlike neural networks, 
adding well-modeled variables that are not powerful 
discriminators does not degrade the performance of the 
decision tree (no additional noise is added to the system). 

1 . Training 

The process in which a decision tree is created is 
usually referred to as decision tree training. Consider 
a sample of known signal and background events where 
each event is defined by a weight w and a list of 
variables x. The following algorithm can be applied to 
such a sample in order to create a decision tree: 

1. Initially normalize the signal training sample to the 
background training sample such that ^signal = 

^background ■ 

2. Create the first node, containing the full sample. 

3. Sort events according to each variable in turn. For 
each variable, the splitting value that gives the best 
signal-background separation is found (more on 
this in the next section). If no split that improves 
the separation is found, the node becomes a leaf. 

4. The variable and split value giving the best 
separation are selected and the events in the node 
are divided into two subsamples depending on 
whether they pass or fail the split criterion. These 
subsamples define two new child nodes. 

5. If the statistics are too low in any node, it becomes 
a leaf. 

6. Apply the algorithm recursively from Step 3 until 
all nodes have been turned into leaves. 

Each of the leaves is assigned the purity value 

s + b 

where s (b) is the weighted sum of signal (background) 
events in the leaf. This value is an approximation of the 
discriminant -D(x) defined in Eq. [5] 
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2. Splitting a Node 

Consider an impurity measure i(t) for node t. 
Desirable features of such a function are that it should 
be maximal for an equal mix of signal and background 
(no separation), minimal for nodes with either only 
signal or only background events (perfect separation), 
symmetric in signal and background purity, and strictly 
concave in order to reward purer nodes. Several such 
functions exist in the literature. We have not found a 
significant advantage to any specific choice and hence use 
the common "Gini index" [59(. 

The impurity measure, or Gini index, is defined as 

2sb 

ZQini = 2p(l -p) = —rrfi, (8) 

(s + by 

where s (b) is the sum of signal (background) weights 
in a node. One can now define the decrease of impurity 
(goodness of split) associated with a split S of node t into 
children tp and tp: 

Ai Ghli (S,t) = iGini(t)-PP-iGmi{tp)-pF-iGini(tF), (9) 

where pp (pf) is the fraction of events that passed 
(failed) split S. The goal is to find the split S* that 
maximizes the decrease of impurity, which corresponds to 
finding the split that minimizes the overall tree impurity. 

3. Boosting 

A powerful technique to improve the performance 
of any weak classifier was introduced a decade ago: 
boosting [g(j. Boosting was recently used in high 
energy physics with decision trees by the MiniBooNE 
experiment [6lll62T|. 

The basic principle of boosted decision trees is to 
train a tree, minimize some error function, and create 
a tree T n+ \ as a modification of tree T n . The boosting 
algorithm used in DO's single top quark search is adaptive 
boosting, known in the literature as AdaBoost [60j | . 

Once a tree indexed by n is built with associated 
discriminant J5 ra (x), its associated error e„ is calculated 
as the sum of the weights of the mis classified events. An 
event is considered misclassified if |D ra (x)— y\ > 0.5 where 
y is 1 for a signal event and for background. The tree 
weight is calculated according to 

a„=/3xln — ^, (10) 

where f3 is the boosting parameter. For each misclassified 
event, its weight Wi is scaled by the factor e a ™ (which will 
be greater than 1). Hence misclassified events will get 
higher weights. A new tree indexed by n + 1 is created 
from the reweighted training sample now working harder 
on the previously misclassified events. This is repeated 
TV times, where N, the number of boosting cycles, is 



a parameter specified by the user. The final boosted 
decision tree result for event i is 

1 N 

D(x i ) = — N ^a n D n (xi). (11) 

Z^n=0 a " n=0 

In all of our tests, boosting improves performance. 
Another advantage of boosting decision trees is that 
averaging produces smoother approximations to -D(x). 
In this analysis 20 boosted trees are used for each analysis 
channel, which improves the performance by 5 to 10%. 
The increase in performance saturates in the region of 20 
boosting cycles, varying slightly from channel to channel. 

4- Decision Tree Parameters 

Several internal parameters can influence the 
development of a decision tree. 

• Initial normalization. Step 1 in Scc. lXII A 11 In this 
analysis, we normalize both signal and background 
such that their sums of weights are both 0.5. 

• Criteria to decide when to stop the splitting 
procedure due to too low statistics (Step 3 in 
Sec. IXH A 1[) . In this analysis the minimum node 
size is 100 events per node. 

• Impurity function to use to find the best split. We 
use the Gini index as mentioned in Sec. IXII A 21 

• Number of boosting cycles. For this analysis we use 
20 boosting cycles. 

• Value of the boosting parameter 0. We find (3 = 0.2 
gives the best expected separation. 

B. Variable Selection 

A list of sensitive variables has been derived based 
on an analysis of the signal and background Feynman 
diagrams [HI, [54|, [55[, from studies of single top quark 
production at next-to- leading order [63l . [64 |. and from 
other analyses [l(| [Hj]. The variables fall into three 
categories: individual object kinematics, global event 
kinematics, and variables based on angular correlations. 
The complete list of 49 variables is shown in Table IXIIl 

Previous iterations of the single top quark analysis at 
DO [2l], [13, [HI have always used far fewer input variables. 
One of the main reasons was that the discriminant was 
computed with neural networks. Introducing too many 
variables can degrade the performance of a network, and 
testing each combination of variables is time-consuming. 
However, we observe that adding more variables does 
not degrade the DT performance. If newly introduced 
variables have some discriminative power, they improve 
the performance of the tree. If they are not discrim- 
inative enough, they are ignored. We tested this 
empirical observation by training different trees using 
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several subsets of variables from the list of 49 variables. 
Adding more variables to the training sets never degraded 
the performance of the trees. Therefore, rather than 
producing separately optimized lists of variables for each 
analysis channel, the full list of 49 variables is used in all 
cases. 



C. Decision Tree Training 

We train the decision trees on one third of the available 
simulated events and keep the rest of the events to 
measure the acceptances. As a cross check, we have also 
trained on one half and on two thirds of the sample and 
have found consistent results with those obtained from 
using only one third. We therefore only present results 
with one third of the sample used for training. 

Three signals are considered: 

• s-channel single top quark process only (tb) 

• t-channel single top quark process only (tqb) 

• s- and t-channel single top quark processes 
combined (tb+tqb) 

For simplicity, and because the decision trees are 
expected to deal well with all components at 
once, trees are trained against all backgrounds 
together rather than making separate trees for each 
background. The background includes simulated events 
for tt~ >£+jcts, tt— >U+]ets, and VF+jets (with three 
separate components for Wbb, Wcc and Wjj). Each 
background component is represented in proportion to its 
expected fraction in the background model. This leads 
to three different decision trees: (tb, tqb, tb+tqb against 
tt, W+jets) for each training. In the tb+tqb training, the 
s- and t-channel components of the signal are taken in 
their SM proportions. 

Samples are split by lepton flavor, jet multiplicity, and 
number of 6-tagged jets. The current analysis uses the 
following samples: one isolated electron or muon; 2, 3 or 
4 jets; and 1 or 2 b tags. Each sample is treated indepen- 
dently with its own training for each signal, leading to 
36 different trees (3 signals x 2 lepton flavors x 3 jet 
multiplicities x 2 fe-tagging possibilities). 



XIII. BAYESIAN NEURAL NETWORKS 
ANALYSIS 

A. Introduction 

A neural network (NN) n(x, w) (6(| is a nonlinear 
function, with adjustable parameters w, which is 
capable of modeling any real function of one or 
more variables [67j . In particular, it can model the 
discriminant -D(x) in Eq. [5] Typically, one finds a 
single point wo in the network parameter space for which 
-D(x) w n.(x, wq). This can be achieved by minimizing an 



error function that measures the discrepancy between the 
value of the function n(x, w) and the desired outcome for 
variables x: 1 for a signal event and (or —1) if x pertain 
to a background event. If the error function is built 
using equal numbers of signal and background events, the 
minimization yields the result -D(x) = n(x, Wo) [68l [69l| 
provided that the function n(x, w) is sufficiently flexible 
and that a sufficient number of training events arc used. 

One shortcoming of the minimization is its tendency, 
unless due care is exercised, to pick a point Wo that 
fits the function n(x, w) too tightly to the training 
data. This over-training can yield a function, n(x, wo), 
that is a poor approximation to the discriminant D(x) 
(Eq. [H]). In principle, the over-training problem can be 
mitigated, and more accurate and robust estimates of 
D(x) constructed, by recasting the task of finding the 
best approximation to -P(x) as one of inference from a 
Bayesian viewpoint [7(3, [tTJ] . The task is to infer the set 
of parameters w that yield the best approximation of 
n(x, w) to -D(x). 

Given training data T, which comprise an equal 
admixture of signal and background events, one assigns 
a probability p(xv\T)dw to each point in the parameter 
space of the network. Since each point w corresponds 
to a network with a specific set of parameter values, the 
probability p(w\T)dw quantifies the degree to which the 
network is a good fit to the training data T. However, 
instead of finding the best single point Wo, one averages 
n(x, w) over every possible point w, weighted by the 
probability of each point. A Bayesian neural network 
(BNN) [7a [n| is defined by the function 



n(x) = / n.(x, w)p(w|T) dw, 



(12) 



that is, it is a weighted average over all possible network 
functions of a given architecture. The calculation is 
Bayesian because one is performing an integration over 
a parameter space. If the function p(w|T) is sufficiently 
smooth, one would expect the averaging in Eq. to 
yield a more robust and more accurate estimate of the 
discriminant -D(x) than from a single best point wo. 

There is however a practical difficulty with Eq. [T3J it 
requires the evaluation of a complicated high-dimension 
integral. Fortunately, this is feasible using sophis- 
ticated numerical methods, such as Markov Chain Monte 
Carlo [TO, [zl, [z3 ■ We use this method to sample from 
the posterior density p(w\T) and to approximate Eq. [T2] 
by the sum 



ra(x) 



1 K 

fc=i 



(13) 



where K is the sample size. 

We perform the Bayesian neural network calculations 
for this analysis using the "Software for Flexible Bayesian 
Modeling" package 74 1. 
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1. BNN Posterior Density 

Given training event T = t,x, where t denotes the 
targets — 1 for signal and for background — and x 
denotes the set of associated variables, we construct the 
posterior probability density p(w\T) via Bayes' theorem 



p(w|T) 



p(T|w)p(w) 
P(T) 

p(t|x,w)p(x|w)p(w) 

p(t|x)jj(x) 
p(t|x,w)p(w) 
p(t|x) ' 



(14) 



with p(x|w) = p(x). We see that there are two functions 
to be defined: the likelihood p(t|x, w) and the prior 
probability density p(w). For this analysis, the neural 
network functions have the form 



n(x,w) = 
where 



l + cxp[-/(x,w)]' 



H 



(15) 



/(x,w) = b + y^v h tanh(a fe + u hi Xj) . (16) 



h=l 



i=i 



H is the number of hidden nodes and / is the number of 
input variables, x. The adjustable parameters w of the 
networks are Uhi and Vh (the weights) and and b (the 
biases). 



2. BNN Likelihood 

If x are the variables for an event, then the event's 
probability to be signal is n(x,w); if it is a background 
event, then its probability is 1 — n(x, w). Therefore, the 
probability of the training event set is 



JV 



p(t|x,w) = IJn 4 ' (1-n) 1 - 



(17) 



where tj = 1 for signal and tj = for background, and 
n is the total number of events. The BNN likelihood is 
proportional to this probability. 



large parameter values yield jagged approximations. 
However, since one does not know a priori what widths 
are appropriate, initially we allowed their values to 
adapt according to the noise level in the training data. 
Subsequently, we found that excessive noise in the 
training data can cause the parameter values to grow too 
large. Therefore, we now keep the widths fixed to a small 
set of values determined using single neural networks. 
This change is an improvement over the method used in 
Ref. HI. 



4- Sampling the BNN Posterior Density 

To compute the average in Eq. [13] requires a sample 
of points w from the posterior density, p(w|T). These 
points are generated using a Markov Chain Monte Carlo 
method. We first write the posterior density as 



p(w\T) = exp [-V(w) 



(18) 



where V(w) = — lnp(w|T) may be thought of as a 
"potential" through which a "particle" moves. We then 
add a "kinetic energy" term T(p) = ^p 2 , where p is a 
vector of the same dimensionality as w, which together 
with the potential yields the particle's "Hamiltonian" 
H = T + V. For a system governed by a Hamiltonian, 
every phase space point (w, p) will be visited eventually 
in such a way that the phase space density of points is 
proportional to exp(— H). The phase space is traversed 
by alternating between long deterministic trajectories 
and stochastic changes in momentum. After every 
random change, one decides whether or not to accept 
the new phase space point: the new state is accepted if 
the energy has decreased, and accepted with a probability 
less than one if the contrary is true. This algorithm yields 
a Markov chain wi, W2, . . . wk of points, which converges 
to a sequence of points that constitute a faithful sample 
from the density p(w\T). In our calculations, each 
deterministic trajectory comprises 100 steps, followed 
by a randomization of the momentum. This creates a 
point that could be used in Eq. 1131 However, since 
the correlation between adjacent points is high, this pair 
of actions is repeated 20 times, which constitutes one 
iteration of the algorithm, and a point is saved after each 
iteration. 



3. BNN Prior Density 

The last ingredient needed to complete the Bayesian 
calculation is a prior probability density. This is the 
most difficult function to specify. However, experience 
suggests that for each network parameter, a Gaussian 
centered at the origin of the parameter space produces 
satisfactory results. Moreover, the widths of the 
Gaussian should be chosen to favor parameter values 
close to the origin, since smaller parameter values 
yield smoother approximations to -D(x). Conversely, 



B. BNN Training 

The Bayesian neural networks are trained separately 
for each of the 12 analysis channels, with different sets 
of variables in each channel. The variables are selected 
using an algorithm called RuleFit [75[ that orders them 
according to their discrimination importance (on a scale 
of 1 to 100). Variables with discrimination importance 
greater than 10 are used, which results in the selection 
of between 18 and 25 variables in the different channels. 
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For example, the variables for the electron+2jets/ltag 
channel are shown in Fig. [T2] Each network contains a 
single hidden layer with 20 nodes, with the sample size K 
set to 100. The number of signal and background events 
used in the training is 10,000 each. 

Rule Fit ranking for e+2jets/1 lag channel 
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FIG. 12: BNN input variables according to their RuleFit 
ranking for the electron+2jets/ltag channel. 




and da is the differential cross section, F is the flux 
factor, and d<& is the Lorentz invariant phase space factor. 
The ME analysis builds a discriminant directly using 
Eq. Q3B thereby potentially making use of all the available 
kinematic information in the event. In particular, the 
method uses 



1 dfTj 

p(x process. ) = — 

a, dx 



(20) 



where x is the configuration of the event, and 
p(x|procesSj) is the probability density to observe x 
given that the physics process is process i to build the 
discriminant given in Eq. [6j 

For each data and simulated event, two discriminant 
values are calculated: a t-channcl discriminant and an 
s-channel discriminant. The t-channel discriminant uses 
the t-channel matrix elements when calculating p(x\S) as 
in Eq. [HI while the s-channel discriminant uses s-channcl 
matrix elements. For each analysis channel, these 
discriminant values arc plotted in a two-dimensional 
histogram, out of which a cross section measurement is 
extracted, as will be discussed in Sec. IXVIIII The ME 
analysis only uses events with two or three jets and one 
or two 6-tags, and given the two types of leptons, that 
results in eight independent analysis channels. 

The matrix element method was developed by DO 
to measure the top quark mass 17611 and has been 
used by DO [z3l and CDF [zl, S 1<| for subsequent 
measurements. The ME method has also been used to 
measure the longitudinal W boson helicity fraction in top 
quark decays 8l| . The result detailed here marks the first 
use of the method to separate signal from background in 
a particle search (2{| . 



XIV. MATRIX ELEMENTS ANALYSIS 

The main idea behind the matrix element (ME) 
technique is that the physics of a collision, including all 
correlations, is contained in the matrix element A4, where 

da= (^m? M> (19) 

F 



A. Event Probability Density Functions 

The event configuration, x, represents the set of 
reconstructed four-momenta for all selected final state 
objects, plus any extra reconstruction-level information, 
such as whether a jet is b tagged, if there is a muon in a 
jet, the quality of the muon track, and so on. However, 
the matrix element, M, depends on the parton-level 
configuration of the event, which we label y. The differ- 
ential cross section, da/dx, can be related to the parton- 
level variant, da/dy, by integrating over all the possible 
parton values, using the parton distribution functions 
to relate the initial state partons to the proton and 
antiproton, and using a transfer function to relate the 
outgoing partons to the reconstructed objects: 
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da 
dx 



/l,i(?l,Q 2 )/2j(?2,Q 2 ) 



^(x|y,j) 

©parton (y) 



(21) 



r 



where 



is the sum over different configurations that 
contribute to the differential cross section: it 
is the discrete analogue to J dy. Specifically, 
this summation includes summing over the initial 
parton flavors in the hard scatter collision and the 
different permutations of assigning jets to partons. 

J dy is an integration over the phase space: 



dy = J dq 1 dq 2 d 3 pid 3 p v d i p qi d i p q2 



(22) 



Many of these integrations are reduced by delta 
functions. 

• fn,j(<l,Q 2 ) is the parton distribution function in 
the proton or antiproton (n = 1 or 2, respectively) 
for the initial state parton associated with config- 
uration j, carrying momentum q, evaluated at the 
factorization scale Q 2 . We use the same factor- 
ization scales as used when the simulated sam ples 
were generated. This analysis uses CTEQ6L1 |42| 
leading-order parton distribution functions via 
lhapdf [821 ]. 

• dans /dy is the differential cross section for the hard 
scatter (HS) collision. It is proportional to the 
square of the leading-order matrix element as given 
by (c.f., Eq.mi): 



dan S j 



(23) 



where q and m are the four-momenta and masses 
of the initial-state partons. 

W(x\y,j) is called the transfer function; it 
represents the conditional probability to observe 
configuration x in the detector given the original 
parton configuration (y, j). The transfer function 
is divided into two parts: 



W(x | y, j) = W perm (x | y, j) W TCCO 



(x|y,j) (24) 



where W perm (x | y, j), discussed in Sec. IXIV A 31 
is the weight assigned to the given jet-to- 
parton permutation and W ICCO (x\y,j), discussed 
in Sec. IXIV A 4| relates the reconstructed value to 
parton values for a given permutation. 

©parton (y) represents the parton- level cuts applied 
in order to avoid singularities in the matrix clement 
evaluation. 



Vegas Monte Carlo integration is used, as 
implemented in the GNU Scientific Library [83|, [H| . 

The probability to observe a particular event given a 
process hypothesis, Eq. [2UJ also requires the total cross 
section (x branching fraction) as a normalization. The 
total cross section (a) is just an integration of Eq. [5TJ 



dx. 6 r 
dx 



(25) 



The term G reC o(x) approximates the selection cuts. 
While conceptually simple, Eq. [25] represents a large 
integral: 13 dimensions for two-jet events, 17 dimensions 
for three-jet events other than tt, and 20 dimensions for 
tt events. However, this integral needs to be calculated 
only once, not once per event, so the actual integration 
time is insignificant. 



1. Matrix Elements 

The matrix elements used in this analysis are listed in 
Table IXIIII The code to calculate the matrix elements 
is taken from the MADGRAPH [85[ leading-order matrix- 
element generator and uses the helas [86[ routines to 
evaluate the diagrams. In Table IXIIII for the single 
top quark processes, the top quark is assumed to decay 
lcptonically: t^Wb^t + vb. For the W + jets processes, 
the W boson is also assumed to decay leptonically: 
W + ^H + v. Charge-conjugate processes are included. 
The same matrix elements are used for both the electron 
and muon channels. Furthermore, we use the same 
matrix elements for heavier generations of incoming 
quarks, assuming a diagonal CKM matrix. In other 
words, for the tb process, we use the same matrix element 
for ud and as initial-state partons. 

New to the analysis after the result published in 
Ref. [25| is an optimization of the three-jet analysis 
channel. For these events, a significant fraction of the 
background is tt— >£+jets, as can be seen from the yield 
tables (see, e.g., Table |X| . While no new processes are 
added to the two-jet analysis, tqg, Wcgg, Wggg, and 
tt— >£+jets are now included in the three-jet analysis. 



2. Top Pairs Integration 

For the tt-^^+jets integration, we cannot assume one- 
to-one matching of parton to reconstructed object. The 
final state has four quarks, so one-to-one matching would 
lead to a four-jet event. We are interested, however, in 
using the tt— ^+jets matrix element in the three-jet bin. 
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TABLE XIII: The matrix elements used in this analysis. The numbers in 
parentheses specify the number of Feynman diagrams included in each process. 
For simplicity, only processes that contain a positively-charged lepton in the final 
state are shown. The charge-conjugated processes are also used. 

Matrix Elements 
Two Jets Three Jets 



Name 


Process 




Name 


Process 


Signals 






Signals 




tb 


ud-^tb (1) 




tbg 


ud — >tbg (5) 


tq 


ub^td (1) 




tqg 


ub— *tdg (5) 




db^tu (1) 






db—>-tug (5) 








tqb 


ug^tdb (4) 










dg^tub (4) 


Backgrounds 






Backgrounds 




Wbb 


ud->Wbb (2) 




Wbbg 


ud^Wbbg (12) 


Wcg 


sg^Wcg (8) 




Wcgg 


sg^Wcgg (54) 


Wgg 


ud^Wgg (8) 




Wggg 


ud->Wggg (54) 








tt— c£+jets 


qq->ti^£ + vbudb (3) 










gg^ti-^£ + ubudb (3) 



The ti events therefore have to "lose" one jet to enter 
this bin. One way that a jet could be lost is by having its 
reconstructed pr be below the selection threshold, which 
is 15GeV. Another way to lose a jet is if it is merged 
with another nearby jet. The jet could also be outside 
the rj acceptance of the analysis with \r/\ > 3.4. There is 
in addition a general reconstruction inefficiency that can 
cause a jet to be lost, but it is a small effect. 

A study of tt— >e+jets simulated events before tagging 
shows that 80% of the time when a jet is lost, there is no 
other jet that passes the selection cuts within 1Z < 0.5, 
that is, it has not been merged with another jet. The 
transverse momentum of quarks not matched to a jet 
passing the selection cuts is peaked at around 15 GeV, 
indicating that the jet is often lost because it falls below 
the jet pr threshold. This study shows that the light- 
quark jets, which have a softer px spectrum, are 1.7 times 
as likely to be lost owing to the px cut as the heavy- 
quark jets. This observation motivated the following 
simplification: assume that the lost jet is from a light 
quark coming from the hadronically decaying W boson. 
In the most common case, the probability assigned to 
losing a jet given parton transverse energy E T alton is 
the probability that the jet is reconstructed to have 
E?p co < 15 GeV, which can be calculated from the jet 
transfer function Wj e t (discussed in Sec. IXIV A4[) : 

maxj^ d^ cco I¥ jct (^ cco |^ arton ), 0.05 j. (26) 

A minimum probability of 5% is used to account for 
other inefficiencies in reconstructing a jet. A random 
number determines which of the two quarks coming from 
the W boson is lost for a particular sample point in the 
MC integration. Other special cases considered are when 



the two light quarks have lZ(qi, 52) < 0.6, in which case 
they are assumed to merge, or if the pscudorapidity of 
the quark is outside our acceptance, in which case it is 
assumed lost. 



3. Assignment Permutations 

The (discrete) summation over different configurations 
incorporated in Eq. [21] includes the summation over the 
different ways to assign the partons to the jets. A weight 
for each permutation is included as the Wperm part of 
the transfer function. This analysis uses two pieces of 
information to determine the weight, namely b tagging 
and muon charge (the muon from b decay): 

Wporm = W-fatagW^chargo- (27) 

The 6-tagging weight is assumed to factor by jet: 

W btag = JJuifctagCtagi | aa,pTi,w), (28) 

i 

where is the flavor of quark i and tag, is true or false 
depending on whether the jet is b tagged or not. The 
weights assigned to cases with and without a b tag are: 

^t a g(tag|a, P7 v7) = P t!lseablc (PT,v)s a (PT,v) 
i«bt a g(notag | a,PT, Tj) = 1 - P taggable (p T) v)e a (PT, fj) 

where e a is the tag-rate function for the particular quark 
flavor and p ta ss ablc is the taggability-rate function, which 
is the probability that a jet is taggablc. 

For the s-channel matrix element and for the tt— >£+jets 
matrix element, there are both a b quark and a b quark 
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in the final state. Furthermore, the matrix clement is not 
symmetric with respect to the interchange of the b and b 
quarks, so it is helpful to be able to distinguish between 
b jets and b jets to make the correct assignment. In the 
case of muonic decays of the b or b quark, it is possible to 
distinguish between the jets by the charge of the decay 
muon. One complication is that a charm quark may also 
decay muonically, and the charge of the muon differs 



between b- 



-v and b^cXX' 



+ vXX'. However, 



because , the muon transverse momentum relative 
to the jet axis, differs in the two cases, the charge of 
the muon still provides information. Similarly to IF&tag, 
we assign the muon charge weight W^harge based on 
whether the jet, if it is assumed to be a b or 6 in the 
given permutation, contains a muon of the appropriate 
charge. The weight is calculated by the probability that 
a b or a b quark decays directly into a muon given that 
there is a muon in the jet, parametrized as a function of 



ol of the muon. 



4- Object Transfer Functions 

We assume that the parton-level to reconstruction- 
level transfer function, Wr eco in Eq. [Ml can be factorized 
into individual per-object transfer functions: 



W ICCO {x | y, j) = Y[ Wij (xi | yi), 



(29) 



where Wij{xi \yi) is a transfer function for one object 

- a jet, a muon, an electron - - and Xi and 
are reconstructed and parton-level information, respec- 
tively, for that object. We assume that angles are 
well measured, so the only transfer functions that are 
not delta functions are those for energy (for jets and 
electrons) and 1/pt (for muons). The jet transfer 
functions, which give the probability to measure a jet 
energy given a certain parton energy, are parametrized 
as double Gaussians in four pseudorapidity ranges, for 
light jets, for b jets with a muon within the jet, and for 
b jets with no muon in the jet. The electron and muon 
transfer functions are parametrized as single Gaussians. 

The jet and muon transfer functions are measured 
in pythia ti— s-^+jets simulated events. The electron 
transfer functions are based on the electron resolution 
measured in single electron and Z boson peak simulated 
events. 



p(x|2jet,t) = — ^ 

Otq dx 

p(x|3jet,s) = _L^s 
a tbg dx 



p{x | 3jet, t) 



1 



{(Jtqb + &tqg) 

Equation 1331 can also be written as 
p(x | 3jet, t) = w tqb p(x\tqb) 



d{(Jtqb + &tqg) 

dx 



(31) 
(32) 
(33) 



in,, 



g p(x\tqg), (34) 



where Wt q b and Wt qg are the relative yields of the two 
signal processes. Calculating the yield fractions using 
Eq. f° r single-tagged events we use w tq b = 0.6 and 
Wtqg = 0.4, while for double-tagged events we use Wt q b = 
1 and w tqg = 0. 

We apply the same methodology of using weights based 
on yield fraction for the p(x|background) calculations. 
We do not use a matrix element for every background 
that exists, however, so the yield fractions cannot 
be determined as for the signal probabilities. Some, 
such as uds-Wcc, are not included because they 
have similar characteristics to ones that are included, 
such as ud^Wbb. Therefore, we use the yields as 
determined from the simulated samples and consider 
what background the matrix elements are meant to 
discriminate against. We find the performance of the 
discriminant to be not very sensitive to the chosen 
weights if the weights are reasonable, and have used the 
weights given in Table IXTVl 



TABLE XIV: Background weights chosen for each analysis 
channel in two-jet and three-jet events. 



Background Fractions 





1 tag 


2 taf 




Weight 


Electron 


Muon 


Electron 


Muon 


Two-Jet Events 










Wwbb 


0.55 


0.60 


0.83 


0.87 


U>Wcg 


0.15 


0.15 


0.04 


0.04 


W W gg 


0.35 


0.30 


0.13 


0.09 


Three-Jet Events 










Wwbbg 


0.35 


0.45 


0.30 


0.40 


WWcgg 


0.10 


0.10 


0.02 


0.03 


WWggg 


0.30 


0.25 


0.13 


0.10 


w ti^£+jots 


0.25 


0.20 


0.55 


0.47 



B. Single Top Quark Discriminants 

We build separate s-channel and t-channcl discrim- 
inants, D s and D t . The signal probability densities for 
the various channels are: 

p(x|2jet,s) = (30 ) 
a t b dx 
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XV. MULTIVARIATE OUTPUT 
DISTRIBUTIONS 

Discriminant output shapes for signal and different 
background components are shown in Fig. 1131 
demonstrating the ability of the three analyses to 
separate signal from background. The DT discriminant 
is narrower and more central owing to the averaging 
effect of boosting (according to Eq. [TT]) . The separation 
powers of the discriminants shown in Fig. [13] are more 
directly visualized in Fig. [TU 

The discriminant outputs for the data and the 
expected standard model contributions are shown in 
Fig. [15] for the three multivariate techniques. The 
outputs show good agreement between data and 
backgrounds, except in the high discriminant regions, 
where an excess of data over the background prediction 
is observed. 

XVI. ENSEMBLES AND BIAS STUDIES 

We have described three sophisticated analyses (DT, 
BNN, ME), each of which produces a posterior density 
for the single top quark production cross section. When 
applied to real data, we obtain well-behaved posterior 
densities. However, this does not guarantee that these 
methods are trustworthy and perform as advertised. 
In order to validate the methods, it is necessary to 
study their behavior on ensembles of pseudodatasets 
with characteristics as close as possible to those of the 
real data. We can use such ensembles to determine, 
for example, whether an analysis is able to extract a 
cross section from a signal masked by large backgrounds. 
We can also determine whether the claimed accuracy is 
warranted. Moreover, by running the three analyses on 
exactly the same ensembles, we can study in detail the 
correlations across analyses and the frequency properties 
of combined results and their significance. 

We generate pseudodatasets from a pool of weighted 
signal and background events, separately for the electron 
and muon channels. For example, out of 1.3 million 
electron events, we calculate a total background yield of 
756 events in the selected data. We randomly sample 
a count N from a Poisson distribution of mean n = 
756 and select N events, with replacement, from the 
pool of 1.3 million weighted events so that events are 
selected with a frequency proportional to their weight. 
The sample contains the appropriate admixture of signal 
and background events, as well as the correct Poisson 
statistics. Moreover, we take into account the fact that 
the multijets and VF+jets sample sizes are 100% anticor- 
related. The sample is then partitioned according to 
the b tag and jet multiplicities, mirroring what is done 
to the real data. The Poisson sampling, followed by 
sampling with replacement, is repeated to generate as 
many pseudodatasets as needed. Each pseudodataset is 
then analyzed in exactly the same way as real data. 

We have performed studies using many different 



ensembles, of which the most important ones are: 

• Background only (i.e., zero signal) ensemble 
with systematics — the background is set to the 
estimated background yield value; the signal cross 
section is set to pb; these Poisson-smeared means 
are further randomized to represent the effects of 
all systematic uncertainties. 

• Standard model signal ensemble with 

systematics - - the background is set to the 
estimated background yield value; the signal cross 
section is set to the standard model value of 
2.86 pb; these Poisson-smeared means arc further 
randomized to represent the effects of all systematic 
uncertainties. 

• Ensembles with different signal cross 
sections — the background is set to the estimated 
background yield value; the signal cross section is 
set to a fixed value between pb and a few times 
the standard model value in each ensemble; only 
Poisson-smearing for statistical effects is applied. 

We use the zero-signal ensemble (with systematics) to 
calculate the p- value, a measure of the significance of the 
observed excess. The p- value is the probability that we 
obtain a measured cross section greater than or equal to 
the observed cross section, if there were no signal present 
in the data. 

We use the SM signal ensemble (with systematics) to 
determine the correlations between the three analysis 
methods so we can combine their results. We also use this 
ensemble to calculate the compatibility of our measured 
result with the SM prediction, by determining how many 
pseudodatasets have a measured cross section at least as 
high as the result measured with data. 

The set of ensembles with different values for the 
signal cross section is used to assess bias in the cross 
section measurement, that is, the difference between 
the input cross section and the mean of the distri- 
bution of measured cross sections. For each multivariate 
analysis, the bias is estimated by applying the entire 
analysis chain to the ensembles of pseudodatasets that 
each have a different value for the single top quark 
cross section. Straight-line fits of the average of the 
measured cross sections versus the input cross section 
for the three multivariate analyses are shown in Fig. 1161 
From this measurement, we conclude that the bias in all 
three analyses is small. Moreover, when compared with 
the variances of the ensemble distributions of measured 
values, the biases are negligible. We thus perform no 
correction to the expected or measured cross section 
values. 

XVII. CROSS-CHECK STUDIES 

In order to check the background model, we apply the 
multivariate discriminants to two background-dominated 
samples defined by the following criteria: (i) 2 jets, 
1 b tag, and H T (£,fl T , alljets) < 175 GcV for a 'W+jets" 
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FIG. 13: For plots (a)-(d), DT and BNN discriminant outputs for tb+tqb in the e+jets channel (left column) and /i+jets 
channel (right column) for events with two jets of which one is b tagged. Plots (e) and (f) show the ME discriminant outputs 
for tb in the e+jets channel, for two-jet and three-jet events respectively. Plots (g) and (h) show the ME discriminants for tqb 
in the b tagged e+jets channel, for two-jet and three-jet events respectively. All histograms are normalized to unity. 



sample; and (ii) 4 jets, 1 b tag, and Ht(£,$t, alljets) > 
300 GeV for a "ti" sample. The first sample is mostly 
VF"+jets and almost no tt, while the second is mostly tt 
and almost no VF+jets. 

The tb+tqb decision tree output distributions for 
these cross-check samples are shown in Fig. [T7] and the 
corresponding Bayesian neural network output distri- 
butions are shown in Fig. |T5] From these data- 
background comparisons, we conclude that there is no 
obvious bias in our measurement. The background model 



describes the data within uncertainties. 

The matrix element analysis does not use four-jet 
events, so the cross-check samples are defined to have 
H T < 175 GeV or H T > 300 GeV for any number of 
jets. Figure [T5] shows the s- and t-channel discriminant 
outputs for two-jet and three-jet events for the Ht < 
175 GeV cross-check samples. The plots have the electron 
and muon channels and the one and two 6-tag channels 
combined for increased statistics. Figure [20] shows the 
same for the Ht > 300 GeV samples. 
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FIG. 14: For plots (a)-(d), DT and BNN tb+tqb signal efficiency versus background efficiency in the e+jets channel (left 
column) and p+jets channel (right column) for events with two jets ol which one is b tagged. Plots (e)-(h) show the ME signal 
versus background efficiency for tb signal (third row) and tqb signal (fourth row), for b tagged e+jets events with two jets (left 
column) and three jets (right column). These curves are derived from the discriminants shown in Fig. 1131 
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FIG. 15: The discriminant outputs of the three multivariate discriminants: (a) DT, (b) BNN, (c) ME s-channel, and (d) ME 
t-channel discriminants. The signal components are normalized to the expected standard model cross sections of 0.88 pb and 
1.98 pb for the s- and t-channels, respectively. The hatched bands show the 1 a uncertainty on the background. 
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FIG. 16: Ensemble average of measured cross section as a function of the input single top quark cross section for the (a) DT, 
(b) BNN, and (c) ME analyses. 
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FIG. 17: DT outputs from the VK+jets (upper row) and tt (lower row) cross-check samples for e+jets events (left column) and 
/i+jets events (right column). 
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FIG. 18: BNN outputs from VF+jets (upper row) and tt (lower row) cross-check samples for e+jets events (left column) and 
^t+jets events (right column). 
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FIG. 19: Ht < 175 GeV cross-check plots in two-jet (upper row) and three-jet (lower row) events for the s-channel ME 
discriminant (left column) and the t-channel ME discriminant (right column). The plots have electrons and muons, one and 
two b tags combined. 
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FIG. 20: Ht > 300 GeV cross-check plots in two-jet (upper row) and three-jet (lower row) events for the s-channel ME 
discriminant (left column) and the t-channel ME discriminant (right column). The plots have electrons and muons, one and 
two b tags combined. 



39 



XVIII. CROSS SECTION MEASUREMENTS 

We use a Bayesian approach [13, HH to extract the 
cross section <r(pp — ► t& + X, tqfe + X) from the observed 
binned discriminant distributions. In principle, the 
binning of data should be avoided because information 
is lost. In practice, however, an unbinncd likelihood 
function is invariably approximate because of the need to 
fit smooth functions to the distributions of the unbinncd 
data. Consequently, the uncertainty in the fits induces an 
uncertainty in the likelihood function that grows linearly 
with the number of events. Without study, it is not 
clear whether an unbinned, but approximate, likelihood 
function will yield superior results to those obtained from 
a binned but exact one. Since we have not yet studied 
the matter, we choose to bin the data and avail ourselves 
of an exact likelihood function. 



A. Bayesian Analysis 

For a given bin, the likelihood to observe count D, if 
the mean count is d, is given by the Poisson distribution 



L(D\d) 



r(D + i) 



(35) 



where T is the gamma function. (We write the Poisson 
distribution in this form to permit the use of nonintcger 
counts in the calculation of expected results. For 
observed results, the counts are of course integers.) The 
mean count d is the sum of the predicted contributions 
from the signal and background sources 



N 



N 



a La + bi = aa + bi 



(36) 



i=i 



where a is the signal acceptance, C the integrated 
luminosity, a the single top quark production cross 
section, bi the mean count (that is, yield) for background 
source i, N the number of background sources, and a = 
a C is the effective luminosity for the signal. For analyses 
in which the signal comprises s- and t-channcl simulated 
events, the latter arc combined in the ratio predicted 
by the standard model. (Without this assumption, the 
probability of count D would depend on the s- and t- 
channel cross sections a s and cr t explicitly.) 

For a distribution of observed counts, the single-bin 
likelihood is replaced by a product of likelihoods 



M 



L(D|d) = L(D|cr, a, b) = H L(D l \d i ) , 



(37) 



where D and d represent vectors of the observed and 
mean counts, and a and b arc vectors of effective 
luminosity and background yields. The product is over 
M statistically independent bins: cither all bins of a 



given lepton flavor, 6-tag multiplicity, or jet multiplicity, 
or all bins of a combination of these channels. 

From Bayes' theorem, we can compute the posterior 
probability density of the parameters, p(a, a, b|D), which 
is then integrated with respect to the parameters a and 
b to obtain the posterior density for the single top quark 
production cross section, given the observed distribution 
of counts D, 

p(<r|D) = JfJJ i(D|cr,a,b)7r(CT,a,b)da(ib. (38) 

Here, M is an overall normalization obtained from the 
requirement J p(a\D)da = 1, where the integration is 
performed numerically up to an upper bound (x ma x when 
the value of the posterior is sufficiently close to zero. 
In this analysis, varying cr max from 30 to 150 pb has 
negligible effect on the result. 

The function ir(a, a, b) is the prior probability density, 
which encodes our knowledge of the parameters a, a, and 
b. Since our knowledge of the cross section a does not 
inform our prior knowledge of a and b, we may write the 
prior density as 



7r(cr, a, b) = 7r(a, b) n(a) . 



(39) 



The prior density for the cross section is taken to be 
a nonnegative flat prior, ir(a) = l/a max for a > 0, 
and 7r(cr) = otherwise. We make this choice because 
it is simple to implement and yields acceptable results 
in ensemble studies (see Sec. IXVI|) . The posterior 
probability density for the signal cross section is therefore 

p{a\D) = — !: / / L(D|cr,a,b)7r(a,b)da(ib. (40) 

A/(J max J J 

We take the mode of p(<j\D) as our measure of the 
cross section, and the 68% interval about the mode 
as our measure of the uncertainty with which the 
cross section is measured. We have verified that these 
intervals, although Bayesian, have approximately 68% 
coverage probability and can therefore be interpreted as 
approximate frcquentist intervals if desired. 

The integral in Eq.2D]is done numerically using Monte 
Carlo importance sampling. We generate a large number 
K of points (afc,bfc) randomly sampled from the prior 
density 7r(a, b) and estimate the posterior density using 



p(ojD) oc / / L(D|cr,a,b)7r(a,b)dadb, 



1 K 

-Y, £(D|<r,a*,b fc ). 



(41) 



k=l 



In the presence of two signals, we use the same 
procedure and calculate the posterior probability density 
according to Eq. [44] replacing a by atb^tqb everywhere. 
We also replace the term aa in Eq.[36]by a t batb+atqb&tqb, 
where atb and a tq b are the effective luminosities of the tb 
and tqb signals, and atb and atqb are their cross sections. 
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The prior density for the cross section Tr(a) in Eq. [36l 
becomes Tr{a tb ,a tqb ) = l/(a tbjmBX + cr tgb , max ) if both a tb 
and Otqb arc > 0, and ^{otb^tqb) = otherwise. With 
these two replacements, the posterior probability density 
becomes a two-dimensional distribution as a function of 
the two cross sections. 



B. Prior Density 

The prior density 7r(a, b) encodes our knowledge of the 
effective signal luminosities and the background yields 
(see Sec. [Xj). The associated uncertainties fall into 
two classes: those that affect the overall normalization 
only, such as the integrated luminosity measurement, 
and those that also affect the shapes of the discriminant 
distributions, which are the jet energy scale and 6-tag 
modeling. 

The normalization effects are modeled by sampling the 
effective signal luminosities a and the background yields 
b from a multivariate Gaussian, with the means set to 
the estimated yields and the covariance matrix computed 
from the associated uncertainties. The covariance matrix 
quantifies the correlations of the systematic uncertainties 
across different sources of signal and background. 

The shape effects are modeled by changing, one at a 
time, the jet energy scale and 6-tag probabilities by plus 
or minus one standard deviation with respect to their 
nominal values. Therefore, for a given systematic effect, 
we create three model distributions: the nominal one, 
and those resulting from the plus and minus shifts. For 
each bin, Gaussian fluctuations, with standard deviation 
defined by the plus and minus shifts in bin yield, are 
generated about the nominal yield, and added linearly 
to the nominal yields generated from the sampling of the 
normalization-only systematic effects. Since effects such 
as a change in jet energy scale affect all bins coherently, 
we assume f 00% correlation across all bins and sources. 
This is done by sampling from a zero mean, unit variance 
Gaussian and using the same variate to generate the 
fluctuations in all bins. 



C. Bayes Ratio 

Given two well-defined hypotheses Hq and Hi 
(e.g., the background-only and the signal -(-background 
hypotheses), it is natural in a Bayesian context to 
consider a Bayes factor Bio, 



as a way to quantify the significance of hypothesis Hi 
relative to Ho. Here, 

L(D\a) = J J L(D\a, a, b) 7r(a, b) dadb (43) 

is the marginal (or integrated) likelihood and n(a) is the 
cross section prior density, which could be taken as a 
Gaussian about the standard model predicted value. 

Another possible use of a Bayes factor is as an objective 
function to be maximized in the optimization of analyses; 
the optimal analysis would be the one with the largest 
expected Bayes factor. These considerations motivate a 
quantity akin to a Bayes factor that is somewhat easier 
to calculate, which we have dubbed a Bayes ratio, defined 
by 

Bayes ratio = (44) 
p(a = 0|D) 

where a is the mode of the posterior density. The three 
analyses are optimized using the expected Bayes ratio, 
which is computed by setting the distribution D to the 
expected one. 

XIX. RESULTS 
A. Expected Sensitivity 

Before making a measurement using data, it is useful 
to calculate the expected sensitivity of these analyses. 
Furthermore, this expected sensitivity is used to optimize 
the choice of parameters in the analyses. For each case 
under consideration we calculate an expected Bayes ratio 
as defined in Sec. IXVIII Cl The highest Bayes ratio 
corresponds to the optimal parameter choice. 

Table IXVI shows the expected Bayes ratio for each 
possible combination of analysis channels in the DT 
analysis. It can be seen from the numbers in the table 
that combining the two single top quark signals (i.e., 
searching for tb+tqb together) results in the best expected 
sensitivity. The single-tag two-jet channel contributes 
the most to this sensitivity, as expected from the high 
signal acceptance and reasonable signal-to-background 
ratio, but the addition of the other channels does improve 
the result; including the poorer ones does not degrade 
it. While the result from this table refers specifically 
to the DT analysis, the conclusions hold for all three 
multivariate techniques. Therefore, from this point 
onward, the 2-4 jets 1-2 tags result, using electrons and 
muons in the tb+tqb channel will be considered as default 
(2-3 jets for the ME analysis). 



Si 



L{T>\Hi) 
L{T>\HoY 
J L(D\a)ir(a)da 
L(D|o- = 0) ' 



(42) 
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B. Expected Cross Sections 

We measure the expected cross sections for the various 
channels by setting the number of data events in each 
channel equal to the (noninteger) expected number of 
background events plus the expected number of signal 
events (using the SM cross section of 2.86 pb at mt op = 
175 GeV), and obtain the following results: 

CT cx P ( p - _> tb + X , tqb + X) = 2.7±\\ P b ( DT ) 

= 2.71} % pb (BNN) 
= 2.8l}jpb (ME). 

The expected cross sections agree with the input cross 
section. The small deviation, less than 10%, is from 
the nonsymmctric nature of several of the systematic 
uncertainties, in particular the jet energy scale and 
b tagging. This effect is also observed in the pseudo- 
datasets. 

The linearity of the methods to measure the 
appropriate signal cross section was discussed in Scc. lXVll 
and Fig. 1161 and no calibration is necessary based on 
those results. 



found to be: 

cr obs (pp -> tb + X) =1.0±0.9pb 
cr obs (pp -> tqb + X) = 4.2±};f pb. 

These measurements each assume the standard model 
value of the single top quark cross sections not being 
measured, since the s-channel measurement considers the 
t-channel process as a background and vice versa. 

We can remove the constraint of the standard model 
ratio and form the posterior probability density as a 
function of the tb and tqb cross sections. This model- 
independent posterior is shown in Fig. [25] for the DT 
analysis, using the tb+tqb discriminant. The most 
probable value corresponds to cross sections of a(tb) = 
0.9 pb and a(tqb) = 3.8 pb. Also shown are the one, 
two, and three standard deviation contours. While this 
result favors a higher value for the t-channcl contribution 
than the SM expectation, the difference is not statis- 
tically significant. Several models of new physics that 
are also consistent with this result are shown in Ref. [89fl . 



C. Measured Cross Sections 



The cross sections measured using data with the three 
multivariate techniques are shown in Fig. where each 
measurement represents an independent subset of the 
data, for example, the 2-jet sample with 1 b tag in the 
electron channel. 

The full combination of available channels (the most 
sensitive case) yields the Bayesian posterior density 
functions shown in Fig. [22] and cross sections of: 

cr obs (pp -^tb + X, tqb + X) = 4.9±\[l pb (DT) 

= iAtH pb (BNN) 
= 4.8t^ pb (ME). 

Figure [22] shows the high-discriminant regions for each 
of the multivariate methods, with the signal component 
normalized to the cross section measured from data. 
Clearly, a model including a signal contribution fits the 
data better than does a background-only model. 

To further illustrate the excess of data events over 
background in the high-discriminant region, Fig. 1241 
shows three variables that are inputs to the DT analysis: 
invariant mass of lepton+6-taggcd jet+ncutrino, W 
transverse mass, and so-called "Q x ff (lepton charge 
times rj of the leading untagged jet). They are each shown 
for low discriminant output, high output, and very high 
output. The excess of data over a background-only model 
clearly increases as the discriminant cut is increased. 

The DT analysis has also measured the s- and t- 
channel cross sections separately. The cross sections are 
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TABLE XV: Expected Bayes ratios from the decision tree analysis, including systematic 
uncertainties, for many combinations of analysis channels. The best values from all 
channels combined are shown in bold type. 
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FIG. 21: Summaries of the cross section measurements using data from each multivariate technique. The left plot is DT, the 
middle one is BNN, and the right plot is ME. 
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tb+tqb cross section [pb] 

FIG. 22: Expected SM and observed Bayesian posterior density distributions for the DT, BNN and ME analyses. The shaded 
regions indicate one standard deviation above and below the peak positions. 




Matrix Element tb Discriminant Matrix Element tqb Discriminant 

FIG. 23: Zooms of the high-discriminant output regions of the three multivariate discriminants: (a) DT, (b) BNN, (c) ME 
s-channel, and (d) ME t-channel discriminants. The signal component is normalized to the cross section measured from data 
in each case. The hatched bands show the 1 a uncertainty on the background. 
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FIG. 24: The fa-tagged top quark mass (top row), W boson transverse mass (second row) and Q(lepton) x?7(untagl) (third row) 
for the tb+tqb analysis with a low (< 0.3, left column), high (> 0.55, middle column), and very high (> 0.65, right column) DT 
output, for lepton flavor (e,fi), number of b-tagged jets (1,2), and jet multiplicity (2,3,4) combined. Hatched areas represent 
the systematic and statistical uncertainties on the background model. The signal cross section is the measured one (4.9 pb). 
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FIG. 25: Posterior probability density as a function of a t b 
and atqb, when both cross sections are allowed to float in 
the fit of the tb + tqb DT analysis. Shown are the contours of 
equal probability density corresponding to one, two, and three 
standard deviations and the location of the most probable 
value, together with the SM expectation. 



XX. COMBINATION OF RESULTS 

Since each multivariate analysis uses the same dataset 
to measure the single top quark cross section, their results 
are highly correlated. However, because the correlation is 
rather less than 100%, one can still gain some additional 
sensitivity by combining the results. We combine the 
three cross section measurements, Ui (i = DT, BNN, 
ME) using the best linear unbiased estimate (BLUE) 
method [90, Hl|, [12] ; that is, we take as the new estimate 
of the cross section the weighted sum 



2 , WjOj, 



(45) 



with Y). Wj = 1, and with the weights chosen so as to 
minimize the variance 



Var(y) = ^2 X] Wi w i ( ^ ov ( cr «' 



(46) 



where Cov (&i,<Tj) = {(JiOj) — are the matrix 

elements of the covariance matrix of the measurements. 
The variance is minimized when 



Wj 



(47) 



where Cov" 1 ^, tjj) denotes the matrix elements of 
the inverse of the covariance matrix. In order to 
estimate the correlation matrix, each analysis is run 
on the same ensemble of pseudodatasets, specifically, 



the SM ensemble with systematics, which comprises 
1,900 pseudodatasets common to all three analyses. To 
estimate the p- value of the combined result, the analyses 
are run on 72,000 pseudodatasets of the background-only 
ensemble. 



A. Weights, Coverage Probability, and Combined 
Measurement 

We use the SM ensemble with systematics to determine 
the weights w, and to check the coverage probability 
of the confidence intervals calculated as described in 
Sec. IXVIII Al The cross section measurements from 
this ensemble are shown in Fig. [26] for the individual 
and combined analyses. The mean and square root of 
the variance obtained from these distributions give the 
following results: 

^SM-cns _^ tb + x tqb + X) 

= 2.9±1.6pb (DT) 

= 2.7±1.5pb (BNN) 

= 3.2±1.4pb (ME) 

= 3.0±1.3pb (Combined). 

The weights tUj for the three analyses are found to be 

• w D t = 0.127, 

• wbnn = 0.386, 

• wme = 0.488. 



The correlation matrix is 



Correlation matrix 



1 0.66 0.64 
0.66 1 0.59 
0.64 0.59 1 



DT 

BNN 

ME 



(48) 

and the one-standard-deviation coverage probability of 
the (Bayesian) confidence interval is 0.67. 

The result from combining the DT, BNN, and DT 
measurements for the single top quark cross section is 



cr obs (pp -^tb + X, 



tqb + X) 

= 4.7 ± 1.3 pb (Combined), 



using the values listed at the beginning of Sec. IXIX CI 
Figure [23 summarizes the measurements of the tb+tqb 
cross section from the individual analyses as well as the 
combination. 



B. Measurement Significance 

Having determined the combined result for the single 
top quark cross section, we can now determine the signal 
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FIG. 26: Distributions of the measured cross sections from 
(a) the individual analyses, and (b) the combined analysis, 
using the SM ensemble with systematics. 
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FIG. 27: The measured single top quark cross sections from 
the individual analyses and their combination. 



significance corresponding to this measurement. Distri- 
butions of the results from all the analyses are shown in 
Fig.CH 

The expected p-value (and the associated significance 
in Gaussian-like standard deviations) is obtained by 
counting how many background-only pseudodatasets 
yield a measured cross section greater than the SM value 
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FIG. 28: Distributions of the cross sections measured from 
data by the three analyses and their combination, using the 
background-only ensemble. The arrow shows the combined 
cross section measurement, 4.7 pb. 



of 2.86 pb. The result is 1.1% or 2.3 standard deviations, 
as shown in Table I XVI I 



TABLE XVI: The expected tb+tqb cross sections, p-values, 
and significances for the individual and combined analyses, 
using the SM value of 2.86 pb for the single top quark 
production cross section as the reference point in Fig. [251 

Expected Results 





Expected 


Expected 


Expected 




cross section 


p- value 


significance 


Analysis 


[Pb] 




(std. dev.) 


DT 


2.7 


0.018 


2.1 


BNN 


2.7 


0.016 


2.2 


ME 


2.8 


0.031 


1.9 


Combined 


2.8 


0.011 


2.3 



The observed p-value is similarly calculated by 
counting how many background-only pseudodatasets 
result in a cross section above the value of 4.7 pb 
measured from data. The result is 0.014% or 3.6 standard 
deviations. The observed cross sections, p- values, and 
significances from all the analyses are summarized in 
Table IXVUl 

Finally, using the SM ensemble with systematics, we 
quantify the compatibility of our result with the SM 
expectation by counting how many pseudodatasets result 
in a cross section with the observed value or higher for 
each of the analyses. The probabilities for the different 
analyses are 10% for the DT analysis, 13% for the ME 
analysis, 13% for the BNN analysis, and 10% for the 
combined analysis. 
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TABLE XVII: The cross sections measured from data, p- 
values, and significances for the individual and combined 
analyses, the latter two obtained using the background-only 
ensemble. 

Observed Results 





Measured 


Measured 


Measured 




cross section 


p- value 


significance 


Analysis 


[Pb] 




(std. dev.) 


DT 


4.9 


0.00037 


3.4 


BNN 


4.4 


0.00083 


3.1 


ME 


4.8 


0.00082 


3.2 


Combined 


4.7 


0.00014 


3.6 



C. Discriminant Comparison 



In order to compare the expected performance of the 
three multivariate techniques, it is instructive to compute 
a power curve for each method using the two hypotheses 
Hi = SM-signal+background and Ho = background only. 
The power curve in Fig. [29] is a plot of the probability to 
accept hypothesis Hi, if it is true, versus the significance 
level, that is, the probability to reject hypothesis Ho, if 
it is true. Figure [25] shows that all three methods exhibit 
comparable performance. 
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FIG. 29: The p-value computed from the SM- 
signal+background ensemble versus the p-value from 
the background-only ensemble for reference cross sections 
varying monotonically from 0-10 pb. For a given signif- 
icance, that is, the probability to reject the background-only 
hypothesis if true, the power is the probability to accept 
the signal+background hypothesis if it is true. For a given 
significance, one wants the power to be a large as possible. 



XXI. |V tb | MEASUREMENT 

Within the SM with three generations of quarks, the 
charged-current interactions of the top quark are of the 
type V-A, and involve a W boson and a down-type quark 
q (q = d,s,b): 



1 Wtq 



V2 



(49) 



where \Vt q \ is one of the elements of the 3x3 unitary 
CKM matrix [IE GJ] , = 1 in the SM, and P L = (1 - 
75)/2 is the left-handed (— ) projection operator. Under 
the assumption of three generations and a unitary CKM 
matrix, the |Vt g | elements are severely constrained [93| : 



\V td \ 
\Vu\ 
\V tb \ 



(8.14 
(41.61 



+0.32\ 
0.64J 
+0.12 
-0.78 



x 10- d 

) x icr 3 



(50) 
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In several extensions of the SM involving, for example, 
a fourth generation of quarks or an additional heavy 
quark singlet that mixes with the top quark, the 3x3 
CKM matrix is no longer required to be unitary, and 
\Vtb\ can be significantly smaller than unity [l9| . 

This paper describes in detail the first direct 
measurement of \Vtb\, based on the single top quark 
production cross section measurement using decision 
trees [25j. The \Vtb\ measurement is a relatively straight- 
forward extension of the cross section measurement using 
the same dataset and analysis infrastructure, since the 
cross section for single top quark production is directly 
proportional to |Vtb| 2 . This measurement of \Vtb\ makes 
no assumptions on the number of generations or unitarity 
of the CKM matrix. However, some assumptions are 
made in the generation of our signal MC samples and the 
extraction of \Vtb\ from the cross section measurement. 
In particular, we assume the following: (i) there are 
only SM sources of single top quark production; (ii) top 
quarks decay to Wb; and (iii) the Wtb interaction is 
CP-conserving and of V-A type. We discuss these 
assumptions in more detail here. 

First, we assume that the only production mechanism 
for single top quarks involves an interaction with a 
W boson. Therefore, extensions of the SM where single 
top quark events can be produced, for example, via 
flavor-changing neutral current interactions [94| or heavy 
scalar or vector boson exchange [95j, are not considered 
here. 

The second assumption is that \V t d\ 2 + \ V ts \ 2 <C \V t b\ 2 - 
In other words, we assume |V ts | and \V t d\ are negligible 
compared to \Vtb\, without making any assumption on 
the magnitude of \Va>\- This is reasonable given the 
measurements of 



R = 



'tb\ 



(51) 



by the CDF [96j and DO [97| collaborations, obtained by 
comparing the rates of ti events with zero, one and two 
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6-tagged jets. 
V td \ 2 



in 



For instance, DO's measurement results 
V ts \ 2 = (-OMt^lWtbl 2 . The requirement 
that \V td \ 2 + \V ts \ 2 < \V tb \ 2 implies that B(t^Wb) ~ 
100% and that single top quark production is completely 
dominated by the Wtb interaction. This assumption is 
made explicitly when measuring the combined tb +tq b 
cross section when assuming the SM ratio of o t b I &t q b UM > 
as well as in the generation of single top quark and ti 
simulated samples. 

Finally, we assume that the Wtb vertex is charge-parity 
(CP) conserving and of the V-A type as given in Eq. 251 
but it is allowed to have an anomalous strength /f . We 
do not allow for right-handed or tensor couplings that 
may occur in the most general Wtb vertex [98l . l99rj . The 
simulated samples can still be used under the assumption 
of an anomalous /f coupling: the ti cross section and 
kinematics, as well as the tb and tqb kinematics are 
completely unaffected. An anomalous value for /-f would 
only rescale the single top quark cross section, allowing 
it to be larger or smaller than the SM prediction, even 
under the assumption of \Vtb\ — 1- Therefore, strictly 
speaking, we are measuring the strength of the V-A 
coupling, i.e., \V t bfi\, which is allowed to be > 1. 
Limiting our measurement to the [0, 1] range implies the 
additional assumption that /f = 1. 



A. Statistical Analysis 

This measurement uses exactly the same machinery 
as used to obtain the single top quark cross section 
posterior. Following standard convention for parameters 
that multiply the cross section, we choose a prior that 
is nonnegative and flat in |Vt&| 2 , which means it is flat 
in the cross section. However, in one of the two cases 
presented below, we restrict the prior to the SM allowed 
region [0,1]. 



B. Systematic Uncertainties 

In order to extract \Vtb\ from the measured cross 
section, additional theoretical uncertainties Q need to be 
considered. These uncertainties are applied separately to 
the tb and tqb samples in order to take the correlations 
into account properly. They arc listed in Table IXVIII 
The uncertainty on the top quark mass of 5.1 GeV 76 1 
is used when estimating the ti cross section uncertainty 
and the tb and tqb cross section uncertainties. 



C. V tb Result 

The measurement for the CKM matrix clement is 
obtained from the most probable value of |Vtb| 2 , given 
by | Vtt, | = v^Vt&p, an d the uncertainty is computed 
as A|V f b| = A | V t b | 2 /2 1 V t b \ • We have used the decision 
tree result to derive a posterior for |Vtb|. The posterior 



TABLE XVIII: Systematic uncertainties on the 
cross section factor required to extract |Vt&|- 

Additional \ V t b\ Uncertainties 





tb 


tqb 


Top quark mass 


8.5% 


13.0% 


Factorization scale 


4.0% 


5.5% 


Parton distributions 


4.5% 


10.0% 




1.4% 


0.01% 



without the prior restricted to be only nonnegative gives 

\v tb ft? 1 ™ +0Mi 



1.72Iq;54, which results in 



IWfl = i.3i±r 2 i- 

The posterior with the prior restricted to the [0,1] region 
gives \V tb \ 2 = 1.00+°;°,^, which results in 

\V tb \ = 1.001°;™. 

The corresponding 95% C.L. lower limit on \V t b\ 2 is 0.46, 
corresponding to a lower limit of 

\V tb \ > 0.68. 

The posterior densities for \Vtb\ 2 for each choice of prior 
are shown in Fig. [3DJ 
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FIG. 30: The posterior density distributions for |Vji,| for (a) 
a nonnegative flat prior, and (b) a flat prior restricted to the 
region [0,1] and assuming fi — 1. The dashed lines show 
the positions of the one, two, and three standard deviation 
distances away from the peak of each curve. 
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XXII. SUMMARY 

Using approximately 0.9 fb _1 of DO data, we have 
performed an analysis of events with a single isolated 
lcpton (electron or muon), missing transverse energy, and 
2-4 jets (1 or 2 of them b tagged). Using three different 
multivariate techniques, decision trees, Bayesian neural 
networks, and matrix elements, we have searched for 
single top quark events from the s-channel (tb) and t- 
channel (tqb) processes combined. We measure the cross 
section to be 

a (pp -+tb + X, tqb + X) = 4.7 ± 1.3 pb. 

This corresponds to an excess of 3.6 Gaussian-equivalent 
standard deviation significance and constitutes the first 
evidence of a single top quark signal. Ensemble tests have 
shown this result to be compatible with the standard 
model cross section with 10% probability. 

The decision tree cross section result has been used 
to extract the first direct measurement of the CKM 
matrix element \Vtb\- This result does not assume 
three-generation unitarity of the matrix. The model 
independent measurement is 

\V tb tf\ = i.3i±S;|f, 



where /f is a generic left-handed vector coupling. If we 
constrain the value of |Vtb| to the standard model region 
(i.e., \V tb \ < 1 and f[ = 1), then at 95% C.L., \V tb \ has 
been measured to be 

0.68 < \V tb \ < 1. 
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